pyconll =================================== Welcome to the ``pyconll`` documentation homepage. pyconll is designed as a flexible wrapper around the CoNLL-U format (and other tabular formats), to allow for easy loading and manipulating of dependency annotations. See an example of pyconll's syntax below. .. code:: python from pyconll.conllu import conllu # Load from disk into memory and iterate over the corpus, printing # sentence ids, and capturing unique verbs verbs = set() corpus = conllu.load_from_file('ud-english-train.conllu') for sentence in corpus: print(sentence.meta.get('sent_id')) for token in sentence.tokens: if token.upos == 'VERB': verbs.add(token.lemma) # Use the iterate version over a larger corpus to save memory huge_corpus_iter = conllu.iter_from_file('annotated_shakespeare.conllu') for sentence in huge_corpus_iter: print(sentence.meta.get('sent_id')) Those new to the project should visit the `Getting Started`__ page which goes through an end-to-end example using pyconll. For loading files visit the format_ page. For API usage, confer with the sentence_, token_, and schema_ module pages which contain documentation for the base data types. Module documentation, guidance pages, and more are listed below in the table of contents. For more information, the github_ project page has examples, tests, and source code. .. toctree:: :maxdepth: 2 :caption: Contents :titlesonly: Getting Started Migration Guide Custom Formats README CHANGELOG pyconll/conllu pyconll/exception pyconll/format pyconll/schema pyconll/sentence pyconll/token pyconll/tree .. _github: https://github.com/matgrioni/pyconll/ .. _format: pyconll/format.html .. _schema: pyconll/schema.html .. _conllu: pyconll/conllu.html .. _sentence: pyconll/unit/sentence.html .. _token: pyconll/unit/token.html __ starting.html