Datastructures to improve the performance of text matching

Question

Datastructures to improve the performance of text matching

51 views Asked by pnv At 11 December 2014 at 10:49

I am working on categorizing some text into a category that suites to the text the most. As a first step, I am writing a simple text matching code. I am comparing words in a piece of text from a text set to words which indicate some categories.

The complexity of this simple search becomes too big O(n^4)!

Text : Many Hollywood movies are fantastic. Cinema lovers are addicted to them. ( n words in 1 sentence and m such sentences)

Categories could be: Movie, Songs, Sports etc. ( p categories each with x words)

Indicator words for movie-[movie, cinema, film ... ] (x words for one category)

So, the search time becomes O (m *n * p * x) which could be too big.

Can you suggest me some datastructure/ method to solve simplify the complexity?

Original Q&A

There are 1 answers

**mickeyandkaka** · Answer 1 · 2014-12-11T11:05:56+00:00

There is an algorithm called Aho–Corasick string matching algorithm based on trie, and for one category, it can check whether the word in the category is appear in the Text.

You can build p tries, and It will perform better than O(m * n * p * x). (I think will be O(p * m * (n + x) ) )

Here is Aho–Corasick_string_matching_algorithm

TechQA.

Datastructures to improve the performance of text matching

There are 1 answers

Related Questions in PERFORMANCE

Related Questions in ALGORITHM

Related Questions in NLP

Related Questions in TIME-COMPLEXITY

Related Questions in CATEGORIZATION

Popular Questions

Popular Tags

Trending Questions