Bitap algorithm for Fuzzy search example

137 views Asked by At

I was looking at this work page 31. It is talking about the Wu and Manber 1992 updated Bitap algorithm for Fuzzy search with Non-deterministic Automata. To be more precise this formula: enter image description here enter image description here


Let's also look at this NFA graph which is used for finding words that are of distance 2 by Levenstein away from the word "survey": enter image description here As per my understanding the formula is used for each state. NFA graph tells us we approve 2 errors, which means we should calculate $R_{0}$, $R_{1}$ and $R_{2}$.

However how would we calculate this example:

Text = "surprise"

Pattern = "rise"

Then the Bitmask is: 

B["r"] = 1000

B["i"] = 0100

B["s"] = 0010

B["e"] = 0010

B[*] = 0000 where * stands for any other character

Okey for 0 errors we will get 0001 on the last letter as we only use $R'_{0}$. But anything that contains more than 0 errors gets me confused. Does the i value stands for index in text or for how many errors we approve.

I also found this Russian article which states that: enter image description here

k is number of errors, j is character index and $s_{x}$ is Bitmask. However the previous article stated that i are states of the NFA. Can you please explain how would the formula work for 1 error or 2, it is so confusing.

0

There are 0 answers