I have been trying to understand the Rabin-Karp algorithm for an algorithms class. I was having alot of trouble understanding it so I tried to implement it (I don't actually have to implement it). I think I properly understand everything except the rolling hash function. My algorithm currently only works If the pattern char[] matches at beginning of the text char[]. I cannot figure out where my rolling hash is going wrong. If someone could please point me in the direction of the error I would be very greatful.
Results text "my test string" pattern "my" - this comes back as matched pattern "test" - this shows not matched
private static int int_mod(int a, int b)
{
return (a%b +b)%b;
}
public static int rabin_Karp(char[] text, char[] pattern)
{
int textSize = text.length;
int patternSize = pattern.length;
int base = 257;
int primeMod = 1000000007;
if(textSize < patternSize)
return -1;n
int patternHash = 0;
for(int i = 0; i < patternSize; i++)
patternHash += int_mod(patternHash * base + pattern[i], primeMod);//This is only done once so put method here
System.out.println("patternHash: " + patternHash);
//calculate the value of the first hash
int segmentHash = 0;
for(int i = 0; i < patternSize; i++) //remove this, this will be duplicate
segmentHash += int_mod(segmentHash * base + text[i], primeMod);
System.out.println("segmentHash: " + segmentHash);
Boolean firstMatch = false;
if(segmentHash == patternHash)
{
firstMatch = true;
for(int i=0; i<pattern.length; i++)
{
if(pattern[i] != text[i])
firstMatch = false;
}
}
if (firstMatch == true)
{
return 0;
}
for(int i=1; i<textSize - patternSize; i++)
{
segmentHash += int_mod(segmentHash * base + text[i + pattern.length -1],primeMod);
segmentHash -= int_mod(segmentHash * base + text[i-1], primeMod);
System.out.println("segmentHash: " + segmentHash);
if(segmentHash == patternHash)
{
firstMatch = true;
for(int j=0; j<pattern.length; j++)
{
if(pattern[j] != text[j])
firstMatch = false;
}
}
if (firstMatch == true)
{
return i;
}
}
return -1;
}
You have a couple of fundamental problems in your code.
The first one is here:
patternHash += int_mod(patternHash * base + pattern[i], primeMod);
It is duplicated on a few more places.The second one is in calculating the rowing hash:
Both this errors can be easily fixed. However, I will suggest you to understand better the logic behind the code instead of just copying it from somewhere. The hashing algorithm you've used is based on polynomials, so try to see what happens on that level. Maybe even write a few examples by hand -- they will be useful while debugging the code.
Note also that you will have problems with integer overflowing:
- int can store numbers which are up to ~2 billion;
- your prime module is ~1 billion so the hashes (
patternHash
andsegmentHash
in particular) can be up to that number;- your base is
int base = 257;
Thus, the expression
segmentHash * base
can be up to ~257 billion, which will surely be an integer overflow.