String search algorithm that returns recursive matches - Java

1.9k views Asked by At

The Rabin-Karp search algorithm is working fine but can anyone help to guide me in modifying it to a recursive search? http://algs4.cs.princeton.edu/53substring/RabinKarp.java.html . For example:

 *  **pattern:** rar
 *  **text:**    abacadabrararbracabrarararacadabrabrarbracad 
 *  **match1:**          rar               
 *  **match2:**            rar
 *  **match3:**                     rar
 *  **match4:**                       rar
 *  **match5:**                         rar
 *  **match5:**                                     rar

Are there other faster algorithm for recursive text matching searches?

SOLUTION

Add external library from http://johannburkard.de/software/stringsearch/ to build path. The code below will return all the starting position of the matches. inclusive of embedded ones like match1 and match2.

import com.eaio.stringsearch.BNDM;

String pattern = "rar";
String text = "abacadabrararbracabrarararacadabrabrarbracad";

// Loop through text to get starting position of matched pattern.
List<Integer> matchPoint =new ArrayList<Integer>();
int slice = -1;
while (slice<text.length()){
    slice+=1;
    com.eaio.stringsearch.BNDM result = new BNDM();
    int pos = result.searchString(text, slice, pattern);
    if (pos != -1) {
        slice = pos;
        matchPoint.add(pos);
    }
}
2

There are 2 answers

3
Boris Strandjev On BEST ANSWER

Of course there is. I will not recommend using Rabin-Karp in case of searching a small pattern in string. KMP i.e Knuth-Morris-Pratt algorithm takes linear time and linear additional memory and can return all the matches without the case of collisions which are nag when dealing with Rabin-Karp. Please read the wiki for it. This algorithm is a bit harder to understand, but shorter to code and once you get it right you feel very satisfied.

0
Daniel Fischer On

For longer patterns, the Boyer-Moore algorithm or variants like Horspool's algorithm are generally faster. The Boyer-Moore algorithm isn't particularly well suited for large alphabets. If the text can be the full Unicode range, it would use a rather large shift table, but if the text is ASCII or latin1, the extra space for the lookup tables is small. For large alphabets, I also recommend KMP.