util

This module provides additional, common methods that build off of the API layer. This module simply adds logic, rather than extending the API. Right now this module is pretty sparse, but will be extended as needed.

API

A set of utilities for dealing with pyconll defined types. This is simply a collection of functions.

pyconll.util.find_ngrams(conll: Iterable[pyconll.unit.sentence.Sentence], ngram: Sequence[str], case_sensitive: bool = True) → Iterator[Tuple[pyconll.unit.sentence.Sentence, int, List[pyconll.unit.token.Token]]][source]

Find the occurrences of the ngram in the provided Conll collection.

This method returns every sentence along with the token position in the sentence that starts the ngram. The matching algorithm does not currently account for multiword tokens, so “don’t” should be separated into “do” and “not” in the input.

Parameters
  • conll – The corpus in which to search for the ngram across the sentences.

  • ngram – The ngram to search for. An iterator of the lemmas.

  • case_sensitive – Flag to indicate if the ngram search should be case sensitive. The case insensitive comparison currently is locale insensitive lowercase comparison.

Returns

An iterator of tuples over the ngrams in the Conll object. The first element is the sentence, the second element is the numeric token index, and the last element is the actual list of tokens references from the sentence. This list does not include any multiword token that were skipped over.

pyconll.util.find_nonprojective_deps(sentence: pyconll.unit.sentence.Sentence) → List[Tuple[pyconll.unit.token.Token, pyconll.unit.token.Token]][source]

Find the nonprojective dependency pairs in the provided sentence.

Dependencies are provided as a list of ordered pairs. Each ordered pair represents a non-projective dependency pair. Each element in the ordered pair is a token, that makes a dependency with its governor. So each token is the base of its dependency, and the two tokens’ dependencies cross in a non projective way.

Parameters

sentence – The sentence to check for nonprojective dependency pairs.

Returns

An list of pairs which represent the children of a nonprojective dependency pair.