load

This is the main module to interface with to load an entire CoNLL treebank resources. The module defines methods for loading a CoNLL treebank through a string, file, or network. There also exist methods that iterate over the CoNLL resource data rather than storing the large CoNLL object in memory, if so desired.

Note that the fully qualified name is pyconll.load, but these methods can also be accessed using the pyconll namespace.

Example

This example counts the number of times a token with a lemma of linguistic appeared in the treebank. If all the operations that will be done on the CoNLL file are readonly or are data aggregations, the iter_from alternatives are more efficient and recommended. These methods will return an iterator over the sentences in the CoNLL resource rather than storing the CoNLL object in memory, which can be convenient when dealing with large files that do not need be completely loaded.

import pyconll

example_treebank = '/home/myuser/englishdata.conll'
conll = pyconll.iter_from_file(example_treebank)

count = 0
for sentence in conll:
    for word in sentence:
        if word.lemma == 'linguistic':
            count += 1

print(count)

API

A wrapper around the Conll class that allow for easy loading of treebanks from multiple formats. This module also contains logic for iterating over treebank data without storing Conll objects in memory.

pyconll.load.iter_from_file(filename)[source]

Iterate over a CoNLL-U file’s sentences.

Parameters:

filename – The name of the file whose sentences should be iterated over.

Yields:

The sentences that make up the CoNLL-U file.

Raises:
  • IOError if there is an error opening the file.
  • ParseError – If there is an error parsing the input into a Conll object.
pyconll.load.iter_from_string(source)[source]

Iterate over a CoNLL-U string’s sentences.

Use this method if you only need to iterate over the CoNLL-U file once and do not need to create or store the Conll object.

Parameters:source – The CoNLL-U string.
Yields:The sentences that make up the CoNLL-U file.
Raises:ParseError – If there is an error parsing the input into a Conll object.
pyconll.load.iter_from_url(url)[source]

Iterate over a CoNLL-U file that is pointed to by a given URL.

Parameters:

url – The URL that points to the CoNLL-U file.

Yields:

The sentences that make up the CoNLL-U file.

Raises:
  • requests.exceptions.RequestException – If the url was unable to be properly retrieved.
  • ParseError – If there is an error parsing the input into a Conll object.
pyconll.load.load_from_file(filename)[source]

Load a CoNLL-U file given the filename where it resides.

Parameters:

filename – The location of the file.

Returns:

A Conll object equivalent to the provided file.

Raises:
  • IOError – If there is an error opening the given filename.
  • ParseError – If there is an error parsing the input into a Conll object.
pyconll.load.load_from_string(source)[source]

Load CoNLL-U source in a string into a Conll object.

Parameters:source – The CoNLL-U formatted string.
Returns:A Conll object equivalent to the provided source.
Raises:ParseError – If there is an error parsing the input into a Conll object.
pyconll.load.load_from_url(url)[source]

Load a CoNLL-U file that is pointed to by a given URL.

Parameters:

url – The URL that points to the CoNLL-U file.

Returns:

A Conll object equivalent to the provided file.

Raises:
  • requests.exceptions.RequestException – If the url was unable to be properly retrieved and status was 4xx or 5xx.
  • ParseError – If there is an error parsing the input into a Conll object.