tree

Tree is a simple, generic tree data structure for representing hierarchical relationships between tokens (such as dependency trees). A Tree can have multiple children and one parent.

Overview

The tree module provides:

  • Tree[T] - A generic tree node containing data of type T

  • from_tokens() - A function to build trees from sequences of tokens

Structure

A Tree has the following key components:

  • data: T - The data stored at this node (e.g., a Token)

  • parent: Optional[Tree[T]] - The parent node (None for root)

  • __getitem__(i) - Access children by index

  • __iter__() - Iterate over children

  • __len__() - Number of children

Creating Trees

Generic Tree Creation

Use tree.from_tokens() to create trees from any sequence of tokens:

from pyconll.tree import from_tokens

tree = from_tokens(
    tokens=my_tokens,
    starting_id='0',                        # Root parent ID
    to_id=lambda t: t.id,                   # Extract token ID
    to_head=lambda t: t.head,               # Extract parent ID
    skip=lambda t: '-' in t.id              # Skip multiword tokens
)

CoNLL-U Tree Creation

For the CoNLL-U model, Sentences have a to_tree method which can be used directly.

from pyconll.conllu import conllu

sentences = conllu.load_from_file('train.conllu')

for sentence in sentences:
    tree = sentence.to_tree()

    # Tree root is the token with head="0"
    root_token = tree.data
    print(f"Root: {root_token.form}")

    # Iterate over dependents
    for child_tree in tree:
        child_token = child_tree.data
        print(f"  Dependent: {child_token.form}")

Traversing Trees

from pyconll.conllu import conllu

sentences = conllu.load_from_file('train.conllu')
tree = sentences[0].to_tree()

# Access root data
root = tree.data
print(f"Root word: {root.form}, POS: {root.upos}")

# Iterate over direct children
for child_tree in tree:
    child = child_tree.data
    print(f"Dependent: {child.form} ({child.deprel})")

    # Recursively process subtree
    for grandchild_tree in child_tree:
        grandchild = grandchild_tree.data
        print(f"  Grandchild: {grandchild.form}")

# Access children by index
if len(tree) > 0:
    first_child = tree[0]
    print(f"First dependent: {first_child.data.form}")

Example: Finding Non-Projective Dependencies

from pyconll.conllu import conllu

def has_nonprojective(tree, start=None, end=None):
    """Check if tree has non-projective dependencies."""
    if start is None:
        # Get token IDs for span calculation
        token_ids = set()
        collect_ids(tree, token_ids)
        start = min(int(id) for id in token_ids if id.isdigit())
        end = max(int(id) for id in token_ids if id.isdigit())

    for child in tree:
        child_id = int(child.data.id) if child.data.id.isdigit() else 0
        if child_id < start or child_id > end:
            return True
        if has_nonprojective(child, start, end):
            return True
    return False

sentences = conllu.load_from_file('train.conllu')
for sentence in sentences:
    tree = sentence.to_tree()
    if has_nonprojective(tree):
        print(f"Non-projective: {sentence.meta['sent_id']}")

API

A general immutable tree module. This module is used when parsing a serial sentence into a Tree structure.

class pyconll.tree.Tree(data: T)[source]

A tree node. This is the base representation for a tree, which can have many children which are accessible via child index. The tree’s structure is immutable, so the data, parent, children cannot be changed once created.

As is this class is useless, and must be created with the TreeBuilder module which is a sort of friend class of Tree to maintain its immutable public contract.

__getitem__(key: int) Tree[T][source]
__getitem__(key: slice) list['Tree[T]']

Get specific children from the Tree. This can be an integer or slice.

Parameters:

key – The indexer for the item.

__init__(data: T) None[source]

Create a tree holding the value. Create a larger Tree, with TreeBuilder.

Parameters:

data – The data to put with the Tree node.

__iter__() Iterator[Tree[T]][source]

Provides an iterator over the children.

__len__() int[source]

Provides the number of direct children on the tree.

Returns:

The number of direct children on the tree.

property data: T

The data on the tree node. The property ensures it is readonly.

Returns:

The data stored on the Tree.

property parent: Tree[T] | None

Provides the parent of the Tree. The property ensures it is readonly.

Returns:

A pointer to the parent Tree reference. None if there is no parent.

pyconll.tree.from_tokens(tokens: Sequence, root_id: I, to_id: Callable[[K], I], to_head: Callable[[K], I], skip: Callable[[K], bool] | None = None) Tree[source]

The completely generic function to create a Tree structure for a sequence of Tokens.

This can be used for tokens other than the pre-defined CoNLL-U schema.

Parameters:
  • tokens – The tokens to create the tree from.

  • root_id – The root token of the tree will be a child of this id.

  • to_id – The mapper from the token to its id.

  • to_head – The mapper from the token to the id of its parent.

  • skip – The optional guard to skip certain tokens that may not participate in the Tree structure.