pyconll

Welcome to the pyconll documentation homepage.

pyconll is designed as a flexible wrapper around the CoNLL-U format, to allow for easy loading and manipulating of dependency annotations. See an example of pyconll’s syntax below.

import pyconll

# Load from disk into memory and iterate over the corpus, printing
# sentence ids, and capturing unique verbs
verbs = set()
corpus = pyconll.load_from_file('ud-english-train.conllu')
for sentence in corpus:
   print(sentence.id)
   for token in sentence:
      if token.upos == 'VERB':
         verbs.add(token.lemma)

# Use the iterate version over a larger corpus to save memory
huge_corpus_iter = pyconll.iter_from_file('annotated_shakespeare.conllu')
for sentence in huge_corpus_iter:
   print(sentence.id)

Those new to the project should visit the Getting Started page which goes through an end-to-end example using pyconll. For loading a file visit the load page. For API usage, confer with the conll, sentence, and token module pages which contain documentation for the base data types. Module documentation, guidance pages, and more are listed below in the table of contents.

For more information, the github project page has examples, tests, and source code.