This is module that provides some useful functionality on top of pyconll. This adds logic on top of the API layer rather than extending it. Right now this module is pretty sparse, but it can be easiy extended as demand arises.


pyconll.util.find_ngrams(conll, ngram, case_sensitive=True)[source]

Find the occurences of the ngram in the provided Conll collection.

This method returns every sentence along with the token position in the sentence that starts the ngram. The matching algorithm does not currently account for multiword tokens, so “don’t” should be separated into “do” and “not” in the input.

  • sentence – The sentence in which to search for the ngram.
  • ngram – The ngram to search for. A random access iterator.
  • case_sensitive – Flag to indicate if the ngram search should be case sensitive.

An iterator over the ngrams in the Conll object. The first element is the sentence and the second element is the numeric token index.