load

This is the main module you should interface with if wanting to load an entire CoNLL file, rather than individual sentences which should be less common. The API allows for loading CoNLL data from a string or from a file, and allows for iteration over the data, rather than storing a large CoNLL object in memory if so desired.

Note that the fully qualified name is pyconll.load, but these methods can also be accessed using the pyconll namespace.

Example

This example counts the number of times a token with a lemma of linguistic appeared in the treebank. Note that if all the operations that will be done on the CoNLL file are readonly, consider using the iter_from alternatives. These methods will return an iterator over each sentence in the CoNLL file rather than storing an entire CoNLL object in memory, which can be convenient when dealing with large files that do not need to persist.

import pyconll

example_treebank = '/home/myuser/englishdata.conll'
conll = pyconll.iter_from_file(example_treebank)

count = 0
for sentence in conll:
    for word in sentence:
        if word.lemma == 'linguistic':
            count += 1

print(count)

API

pyconll.load.iter_from_file(filename)[source]

Iterate over a CoNLL-U file’s sentences.

Args: filename: The name of the file whose sentences should be iterated over.

Returns: An iterator that yields consecutive sentences.

pyconll.load.iter_from_string(source)[source]

Iterate over a CoNLL-U string’s sentences.

Use this method if you only need to iterate over the CoNLL-U file once and do not need to create or store the Conll object.

Args: source: The CoNLL-U string.

Returns: An iterator that yields consecutive sentences.

pyconll.load.load_from_file(filename)[source]

Load a CoNLL-U file given the filename where it resides.

Args: filename: The location of the file.

Returns: A Conll object equivalent to the provided file.

pyconll.load.load_from_string(source)[source]

Load CoNLL-U source in a string into a Conll object.

Args: source: The CoNLL-U formatted string.

Returns: A Conll object equivalent to the provided source.