preventing certain docs from being indexed in clucene

Question

preventing certain docs from being indexed in clucene

108 views Asked by duffy At 14 August 2013 at 14:40

I am building a search index with clucene and I want to make sure docs containing any offensive terms never get added to the index. Using a StandardAnalyzer with stop list is not good enough since the offensive doc still gets added and would be returned for non-offensive searches.

Instead I am hoping to build up a document, then check if it contains any offensive words, then adding it only if it doesn't.

Cheers!

Original Q&A

There are 1 answers

**synhershko** · Accepted Answer · 2013-10-16T21:38:45+00:00

You can't really access that type of data in a Document

What you can do is run the analysis chain manually on the text and check each token individually. You can do this in a stupid loop, or by adding another analyzer to the chain that just raises a flag you check later.

This introduces some more work, but the best way to achieve that IMO.

TechQA.

preventing certain docs from being indexed in clucene

There are 1 answers

Related Questions in CLUCENE

Related Questions in STANDARDANALYZER

Popular Questions

Trending Questions