sentence

The Sentence module represents an entire CoNLL sentence. A sentence is composed of two main parts, the comments and the tokens.

Comments

Comments are treated as key-value pairs, where the separating character between key and value is =. If there is no = present then then the comment is treated as a singleton and the corresponding value is None. To access and write to these values look for values related to meta (the meta data of the sentence).

Some things to keep in mind is that the id and text of a sentence can be accessed through member properties directly rather than through method APIs. So sentence.id, rather than sentence.meta_value('id'). Note that since this API does not support changing the forms of tokens, and focuses on the annotation of tokens, the text value cannot be changed of a sentence, but all other meta values can be.

Document and Paragraph ID

Document and paragraph id of a sentence are automatically inferred from a CoNLL treebank given the comments on each sentence. Note that if you wish to reassign these ids, it will have to be at the sentence level, there is no simplifying API to allow for easier mass assignment of this.

Tokens

These are the meat of the sentence. Some things to note for tokens are that they can be accessed either through id as defined in the CoNLL data as a string or as numeric index. The string id indexing allows for multitoken and null nodes to be included easily. So the same indexing syntax understands both, sentence['2-3'] and sentence[2].

API

class pyconll.unit.sentence.Sentence(source, _start_line_number=None, _end_line_number=None)[source]

A sentence in a CoNLL-U file. A sentence consists of several components.

First, are comments. Each sentence must have two comments per UD v2 guidelines, which are sent_id and text. Comments are stored as a dict in the meta field. For singleton comments with no key-value structure, the value in the dict has a value of None.

Note the sent_id field is also assigned to the id property, and the text field is assigned to the text property for usability, and their importance as comments. The text property is read only along with the paragraph and document id. This is because the paragraph and document id are not defined per Sentence but across multiple sentences. Instead, these fields can be changed through changing the metadata of the Sentences.

Then comes the token annotations. Each sentence is made up of many token lines that provide annotation to the text provided. While a sentence usually means a collection of tokens, in this CoNLL-U sense, it is more useful to think of it as a collection of annotations with some associated metadata. Therefore the text of the sentence cannot be changed with this class, only the associated annotations can be changed.

conll()[source]

Convert the sentence to a CoNLL-U representation.

Returns: A string representing the Sentence in CoNLL-U format.

doc_id

Get the document id associated with this Sentence. Read-only.

Returns: The document id or None if no id is associated.

id

Get the sentence id.

Returns: The sentence id. If there is none, then returns None.

meta_present(key)[source]

Check if the key is present as a singleton or as a pair.

Args: key: The value to check for in the comments.

Returns: True if the key was provided as a singleton or as a key value pair. False otherwise.

meta_value(key)[source]

Returns the value associated with the key in the metadata (comments).

Args: key: The key whose value to look up.

Returns: The value associated with the key as a string. If the key is not present then a KeyError is thrown, and if the key is a singleton then None is returned.

par_id

Get the paragraph id associated with this Sentence. Read-only.

Returns: The paragraph id or None if no id is associated.

set_meta(key, value=None)[source]

Set the metadata or comments associated with this Sentence.

Args: key: The key for the comment. value: The value to associate with the key. If the comment is a

singleton, this field can be ignored or set to None.
text

Get the continuous text for this sentence. Read-only.

Returns: The continuous text of this sentence. If none is provided in comments, then None is returned.