How to merge three Conllu files with Conllu python library?

388 views Asked by At

This is my first time working with conllu files. I'm not able to find any way to merge these files in the Conllu python library. Any leads would be helpful. Thanks.

1

There are 1 answers

0
Emil Stenström On

Each time you call parse() you get a list of TokenLists back. Merging several files can therefore be done by merging those tokenlists.

Example:

from io import open
from conllu import parse_incr

files = ["file1.conllu", "file2.conllu", "file3.conllu"]

merged_tokenlists = []
for file in files:
    data_file = open("file1.conllu", "r", encoding="utf-8")
    for tokenlist in parse_incr(data_file):
        merged_tokenlists.append(tokenlist)

Author of the conllu library here, happy to see people using it!