spaCy process document with multiple languages

Question

spaCy process document with multiple languages

3.8k views Asked by Milla Well At 28 August 2017 at 09:02

Given a document string s of a certain length and a language mask l of the same length I would like to process each part (span?) of the document with the according spacy language model.

say for example

s = 'As one would say in German: Wie man auf englisch zu sagen pflegt'
l = ['en'] * 27 + ['de'] * 37

I would like to construct a document out of

import spacy
nlp_de = spacy.load('de')
nlp_en = spacy.load('en')

d_de = nlp_de(u"".join([c for i,c in enumerate(s) if l[i] == "de"]))
d_en = nlp_en(u"".join([c for i,c in enumerate(s) if l[i] == "en"]))

And now I would somehow have to glue that two parts together. But unfortunately, the document in spacy holds information about the vocabulary. This would thus be ambiguous.

How should I model my multi-language documents with spacy?

Original Q&A

There are 1 answers

**lazary** · Answer 1 · 2017-08-28T15:47:45+00:00

2 thoughts regarding this:

code switch: which is the combination of more than 1 language into (mainly but not restricted to) spoken text. This is not exactly your example.
Sentences like yours, which are kind of separable.

If most of your text is more like your example, i would try to try and separate the text by languages (for your example i would yield 2 sentences and process each on its own).

If it's the other case, I'm not sure if spacy has built-in support for code-switch, and if not you'll need to build your own models (or just try to combine those of spacy) depends on your actual task

TechQA.

spaCy process document with multiple languages

There are 1 answers

Related Questions in PYTHON

Related Questions in DATA-STRUCTURES

Related Questions in NLP

Related Questions in MULTILINGUAL

Related Questions in SPACY

Popular Questions

Popular Tags

Trending Questions