How does full-text search snowball algorithm interpret words of an unspecified language

143 views Asked by kvdm.dev At 27 May 2021 at 17:20

I build a full-ext search index with sqlite and don't understand what is going on internally when i'm scanning documents contain few languages.

For example, i describe a programming topic i'm learning in Russian and add into the description code blocks with programming language syntax statements and comments which are obviously in English.

Let's consider the example document.txt

Вывод хранимых данных производится следующей командой

import storage
def main()  # Comments just to represent an example
    print(storage.data)

As you can see document.txt consists of two languages.

I use the snowball tokenizer(it reuses standard sowball library) to index the completed documents explicitly specifying CREATE TABLE documents USING FTS5(text, tokenize='snowball russian'); and it handles it with no issues. So here is a point why? The documents contain English words and later on, the index contains English stems along with Russian stems, i can search команда or commenting successfully. Is it how things work?

Original Q&A

TechQA.

How does full-text search snowball algorithm interpret words of an unspecified language

There are 0 answers

Related Questions in SQLITE

Related Questions in FULL-TEXT-SEARCH

Related Questions in SNOWBALL

Related Questions in FTS5

Popular Questions

Trending Questions