Commit b08b548
doc/mit-ocw/hash.md
@@ -1,8 +1,8 @@
# Hashing
-https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-fall-2011/lecture-videos/lecture-8-hashing-with-chaining/
+[video][mit-ocw]
-movitation
+## Motivation
* database
* compilers and interpreters
@@ -15,16 +15,23 @@ Direct Access table:
----------
0 | |
+ ----------
1 | |
+ ----------
2 | |
+ ----------
3 | |
+ ----------
4 | |
+ ----------
| ... |
+ ----------
| ... |
+ ----------
| ... |
----------
-Hash requires mapping to an integer.
+> Hash requires mapping to an integer.
ideally:
@@ -116,8 +123,40 @@ Analysis:
2. multiplication method:
* `h(k) = [(a*k) mod 2^w] >> (w - r)`
3. Universal hasing:
- * `h(k) = [(ak+b) mod p] mod m
+ * `h(k) = [(ak+b) mod p] mod m`
* `a`, `b` are random between 0 and `p-1`. p is a big prime number larger than the universe.
- * {0, ..,p-1}
+ * `{0, ..,p-1}`
* worst case keys: k1 != k2
* `1/m`
+
+## How to choose m?
+
+* Want `m = theta(n)`
+ * `0(1)`
+
+Idea:
+
+> Start small; grow/shrink as necessary
+
+E.g.
+
+* If n > m: grow table
+
+* grow table:
+ * m -> m'
+ * make table of size `m'`
+ * build new hash function `h'`
+ * rehash
+ * for item in T:
+ * t:insert(item)
+ * `m' = 2m` table doubling
+
+Amortization:
+* operation takes `T(n) amortized`
+ * if `k` operations take <= `k*T(n)` time
+* spread out the high cost so you pay a little bit, but more often.
+* ~ Think of meaning `T(n)` on average, where average is taken over all the oeprations.
+
+> spread out the high cost so that it's cheap on average all the time.
+
+[mit-ocw]: https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-fall-2011/lecture-videos/lecture-8-hashing-with-chaining/
doc/mit-ocw/kr.md
@@ -0,0 +1,74 @@
+# Karp-Rabin string matching algorithm
+* https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-fall-2011/lecture-videos/lecture-9-table-doubling-karp-rabin/
+
+* given two string `s` & `t`: does `s` occur as a substring of `t`?
+
+```plaintext
+t: [ | | | | | | | | | | | | | | | | ]
+s: [ | | | ]
+ s: [ | | | ]
+ s: [ | | | ]
+ s: [ | | | ]
+ s: [ | | | ]
+ s: [ | | | ]
+ s: [ | | | ]
+ s: [ | | | ]
+ s: [ | | | ]
+ s: [ | | | ]
+ s: [ | | | ]
+ s: [ | | | ]
+ s: [ | | | ]
+ s: [ | | | ]
+```
+
+Time:
+ * `theta(|s| * |t| - |s|)
+ * `theta(|s| * |t|) # length of s * length of t
+
+Can we use hashing to get this down to linear time?
+
+* We are looking for `O(|s|+|t|)` length of `s` + `t` is better than `s` times `t`.
+
+Rolling hash ADT:
+
+* `r.append(c)`:
+ * add char.c to end of x
+* `r.skip(c)`:
+ * delete fist character of x (assuming it is c)
+
+
+* `r` maintains a string `x`
+ * `r()` hash value of x = `h(x)`
+
+```python
+for c in s: rs.append(c)
+for c in t[:len(s)]:
+ rt.append(c)
+if rs() == rt(): ...
+for i in range(len(s), len(t)):
+ rt.skip(t[i - len(s)]
+ rt.append(t[i])
+ if rs() == rt:
+ s == t[i-len(s) + 1: i + i]
+ if equal:
+ found match
+ else:
+ happens with probability <= 1/|s|
+```
+
+division method:
+* `h(k) = k mod m` where `m` is a random prime >= |s| (prime is greater than length of s)
+* treat string x as a multi digit number base is size of alphabet.
+ * ascii -> 256
+ * unicode -> x
+
+```python
+r.append(c):
+ u -> u * a + ord(c)
+ r -> r * a + ord(c) mod m
+
+r.skip():
+ u -> u - c * a^|u|-1
+```
+
+* https://www.youtube.com/watch?v=qQ8vS2btsxI