Is there a Lucene Analyzer that will ignore the difference between greek symbol and phonetic english name?

Question

Is there a Lucene Analyzer that will ignore the difference between greek symbol and phonetic english name?

33 views Asked by azulBonnet At 01 June 2023 at 05:25

Ideally, I'd like something that otherwise acts like a StandardAnalyzers but treats all Greek Symbols as equivalent with their English phonetic spelling ("beta" == "β", "omega" == "ω"). I looked at the ICU analyzer but it doesn't go quite that far. If it doesn't exist, might you have a suggestion about the most efficient way to design such an analyzer?

Original Q&A

There are 1 answers

**azulBonnet** · Accepted Answer · 2023-06-07T23:51:38+00:00

After doing research on @Val suggestion. I put this together. I'm not sure if it's quite right but saving here in case anyone finds as a useful starting point.

    private static Analyzer GetGreekSymbolAgnosticAnalyzer()
    {
        NormalizeCharMap.Builder builder = new NormalizeCharMap.Builder();
        builder.Add("α", "alpha");
        builder.Add("β", "beta");
        builder.Add("ω", "omega");

        NormalizeCharMap norm = builder.Build();
        Analyzer analyzer = Analyzer.NewAnonymous(createComponents: (fieldName, reader) =>
        {
            Tokenizer tokenizer = new StandardTokenizer(LuceneVersion.LUCENE_48, reader);
            return new TokenStreamComponents(tokenizer, new StandardFilter(LuceneVersion.LUCENE_48, tokenizer));
        }, initReader: (fieldName, reader) => new MappingCharFilter(norm, reader));

        return analyzer;
    }

TechQA.

Is there a Lucene Analyzer that will ignore the difference between greek symbol and phonetic english name?

There are 1 answers

Related Questions in LUCENE

Related Questions in LUCENE.NET

Popular Questions

Trending Questions