Karp-Rabin string matching algorithm
-
given two string
s&t: doessoccur as a substring oft?
t: [ | | | | | | | | | | | | | | | | ]
s: [ | | | ]
s: [ | | | ]
s: [ | | | ]
s: [ | | | ]
s: [ | | | ]
s: [ | | | ]
s: [ | | | ]
s: [ | | | ]
s: [ | | | ]
s: [ | | | ]
s: [ | | | ]
s: [ | | | ]
s: [ | | | ]
s: [ | | | ]
Time:
- `theta(|s| * |t| - |s|)
- `theta(|s| * |t|) # length of s * length of t
Can we use hashing to get this down to linear time?
- We are looking for
O(|s|+|t|)length ofs+tis better thanstimest.
Rolling hash ADT:
-
r.append(c):- add char.c to end of x
-
r.skip(c):- delete fist character of x (assuming it is c)
-
rmaintains a stringxr()hash value of x =h(x)
for c in s: rs.append(c)
for c in t[:len(s)]:
rt.append(c)
if rs() == rt(): ...
for i in range(len(s), len(t)):
rt.skip(t[i - len(s)]
rt.append(t[i])
if rs() == rt:
s == t[i-len(s) + 1: i + i]
if equal:
found match
else:
happens with probability <= 1/|s|
division method:
h(k) = k mod mwheremis a random prime >= |s| (prime is greater than length of s)- treat string x as a multi digit number base is size of alphabet.
- ascii -> 256
- unicode -> x
r.append(c):
u -> u * a + ord(c)
r -> r * a + ord(c) mod m
r.skip():
u -> u - c * a^|u|-1