Last modified: September 01, 2025
This article is written in: πΊπΈ
Searching
Searching refers to the process of finding the location of a specific element within a collection of data, such as an array, list, tree, or graph. It underpins many applications, from databases and information retrieval to routing and artificial intelligence. Depending on the organization of the data, different search techniques are usedβsuch as linear search for unsorted data, binary search for sorted data, and more advanced approaches like hash-based lookup or tree traversals for hierarchical structures. Efficient searching is important because it directly impacts the performance and scalability of software systems.
Linear & Sequential Search
Linear Search
Scan the list from left to right, comparing the target with each element until you either find a match (return its index) or finish the list (report βnot foundβ).
Example inputs and outputs
Example 1
$$ \text{Input: } [7, 3, 5, 2, 9], \quad \text{target} = 5 $$
$$ \text{Output: } \text{index} = 2 $$
Example 2
$$ \text{Input: } [4, 4, 4], \quad \text{target} = 4 $$
$$ \text{Output: } \text{index} = 0 (\text{first match}) $$
Example 3
$$ \text{Input: } [10, 20, 30], \quad \text{target} = 25 $$
$$ \text{Output: } \text{not found} $$
How Linear Search Works
We start at index 0
, compare the value with the target, and keep moving right until we either find it or reach the end.
Target 5 in [7, 3, 5, 2, 9]
Indexes: 0 1 2 3 4
List: [7] [3] [5] [2] [9]
Target: 5
Step 1: pointer at index 0
|
v
7 3 5 2 9
β compare 7 vs 5 β no
Step 2: pointer moves to index 1
|
v
7 3 5 2 9
β compare 3 vs 5 β no
Step 3: pointer moves to index 2
|
v
7 3 5 2 9
β compare 5 vs 5 β YES β
β return index 2
Worst Case (Not Found)
Target 9 in [1, 2, 3]
Indexes: 0 1 2
List: [1] [2] [3]
Target: 9
Checks:
β 1 β 9
β 2 β 9
β 3 β 9
β end
β not found β
- Works on any list; no sorting or structure required.
- Returns the first index containing the target; if absent, reports βnot found.β
- Time: $O(n)$ comparisons on average and in the worst case; best case $O(1)$ if the first element matches.
- Space: $O(1)$ extra memory.
- Naturally finds the earliest occurrence when duplicates exist.
- Simple and dependable for short or unsorted data.
- Assumes 0-based indexing in these notes.
Sentinel Linear Search
Place one copy of the target at the very end as a βsentinelβ so the scan can run without checking bounds each step; afterward, decide whether the match was inside the original list or only at the sentinel position.
Example inputs and outputs
Example 1
$$ \text{Input: } [12, 8, 6, 15], \quad \text{target} = 6 $$
$$ \text{Output: } \text{index} = 2 $$
Example 2
$$ \text{Input: } [2, 4, 6, 8], \quad \text{target} = 5 $$
$$ \text{Output: } \text{not found } (\text{only the sentinel matched}) $$
How it works
Put the target at one extra slot at the end so the loop is guaranteed to stop on a match; afterward, check whether the match was inside the original range.
Target 11 not in the list
Original list (n=5):
[ 4 ][ 9 ][ 1 ][ 7 ][ 6 ]
Target: 11
Add sentinel (extra slot):
[ 4 ][ 9 ][ 1 ][ 7 ][ 6 ][ 11 ]
0 1 2 3 4 5 β sentinel
Scan step by step:
4 β 11 β pointer at 0
9 β 11 β pointer at 1
1 β 11 β pointer at 2
7 β 11 β pointer at 3
6 β 11 β pointer at 4
11 = 11 β pointer at 5 (sentinel)
Therefore, not found in original list.
Target 6 inside the list
Original list (n=4):
[ 12 ][ 8 ][ 6 ][ 15 ]
Target: 6
Add sentinel:
[ 12 ][ 8 ][ 6 ][ 15 ][ 6 ]
0 1 2 3 4
Scan:
12 β 6 β index 0
8 β 6 β index 1
6 = 6 β index 2 β
- Removes the per-iteration βhave we reached the end?β check; the sentinel guarantees termination.
- Same $O(n)$ time in big-O terms, but slightly fewer comparisons in tight loops.
- Space: needs one extra slot; if you cannot append, you can temporarily overwrite the last element (store it, write the target, then restore it).
- After scanning, decide by index: if the first match index < original length, itβs a real match; otherwise, itβs only the sentinel.
- Use when micro-optimizing linear scans over arrays where bounds checks are costly.
- Behavior with duplicates: still returns the first occurrence within the original range.
- Be careful to restore any overwritten last element if you used the in-place variant.
Divide & Conquer Search
Binary Search
On a sorted array, repeatedly halve the search interval by comparing the target to the middle element until found or the interval is empty.
Example inputs and outputs
Example 1
$$ \text{Input: } A = [2, 5, 8, 12, 16, 23, 38], \quad \text{target} = 16 $$
$$ \text{Output: } \text{index} = 4 $$
Example 2
$$ \text{Input: } A = [1, 3, 3, 3, 9], \quad \text{target} = 3 $$
$$ \text{Output: } \text{index} = 2 \quad (\text{any valid match; first/last needs a variant}) $$
Example 3
$$ \text{Input: } A = [10, 20, 30, 40], \quad \text{target} = 35 $$
$$ \text{Output: } \text{not found} $$
How it works
We repeatedly check the middle element, and then discard half the list based on comparison.
Find 16 in:
A = [ 2 ][ 5 ][ 8 ][ 12 ][ 16 ][ 23 ][ 38 ]
i = 0 1 2 3 4 5 6
Step 1
low = 0, high = 6
mid = (0+6)//2 = 3
A[3] = 12 < 16 β target is to the RIGHT β new low = mid + 1 = 4
A = [ 2 ][ 5 ][ 8 ][ 12 ][ 16 ][ 23 ][ 38 ]
i = 0 1 2 3 4 5 6
βL βM βH
0 3 6
Active range: indices 0..6
Step 2
low = 4, high = 6
mid = (4+6)//2 = 5
A[5] = 23 > 16 β target is to the LEFT β new high = mid - 1 = 4
A = [ 2 ][ 5 ][ 8 ][ 12 ][ 16 ][ 23 ][ 38 ]
i = 0 1 2 3 4 5 6
βL βM βH
4 5 6
Active range: indices 4..6
Step 3
low = 4, high = 4
mid = 4
A[4] = 16 == target β
A = [ 2 ][ 5 ][ 8 ][ 12 ][ 16 ][ 23 ][ 38 ]
i = 0 1 2 3 4 5 6
βLMH
4
Active range: indices 4..4
FOUND at index 4
- Requires a sorted array (assume ascending here).
- Time: $O(log n)$; Space: $O(1)$ iterative.
- Returns any one matching index by default; βfirst/last occurrenceβ is a small, common refinement.
- Robust, cache-friendly, and a building block for many higher-level searches.
- Beware of off-by-one errors when shrinking bounds.
Ternary Search
Like binary, but splits the current interval into three parts using two midpoints; used mainly for unimodal functions or very specific array cases.
Example inputs and outputs
Example 1
$$ \text{Input: } A = [1, 4, 7, 9, 12, 15], \quad \text{target} = 9 $$
$$ \text{Output: } \text{index} = 3 $$
Example 2
$$ \text{Input: } A = [2, 6, 10, 14], \quad \text{target} = 5 $$
$$ \text{Output: } \text{not found} $$
How it works
We divide the array into three parts using two midpoints m1
and m2
.
- If
target < A[m1]
β search $[low .. m1-1]$ - Else if
target > A[m2]
β search $[m2+1 .. high]$ - Else β search $[m1+1 .. m2-1]$
A = [ 1 ][ 4 ][ 7 ][ 9 ][ 12 ][ 15 ]
i = 0 1 2 3 4 5
Target: 9
Step 1
low = 0, high = 5
m1 = low + (high - low)//3 = 0 + (5)//3 = 1
m2 = high - (high - low)//3 = 5 - (5)//3 = 3
A[m1] = 4
A[m2] = 9
A = [ 1 ][ 4 ][ 7 ][ 9 ][ 12 ][ 15 ]
i = 0 1 2 3 4 5
βL βm1 βm2 βH
0 1 3 5
FOUND at index 3
- Also assumes a sorted array.
- For discrete sorted arrays, it does not beat binary search asymptotically; it performs more comparisons per step.
- Most valuable for searching the extremum of a unimodal function on a continuous domain; for arrays, prefer binary search.
- Complexity: $O(log n)$ steps but with larger constant factors than binary search.
Jump Search
On a sorted array, jump ahead in fixed block sizes to find the block that may contain the target, then do a linear scan inside that block.
Example inputs and outputs
Example 1
$$ \text{Input: } A = [1, 4, 9, 16, 25, 36, 49], \quad \text{target} = 25, \quad \text{jump} = \lfloor \sqrt{7} \rfloor = 2 $$
$$ \text{Output: } \text{index} = 4 $$
Example 2
$$ \text{Input: } A = [3, 8, 15, 20, 22, 27], \quad \text{target} = 21, \quad \text{jump} = 2 $$
$$ \text{Output: } \text{not found} $$
How it works
Perfect β thatβs a jump search trace. Let me reformat and polish it so the steps are crystal clear and the βjump + linear scanβ pattern pops visually:
Weβre applying jump search to find $25$ in
$$ A = [1, 4, 9, 16, 25, 36, 49] $$
with $n=7$, block size $\approx \sqrt{7} \approx 2$, so jump=2.
We probe every 2nd index:
- probe = 0 β $A[0] = 1 < 25$ β jump to 2
- probe = 2 β $A[2] = 9 < 25$ β jump to 4
- probe = 4 β $A[4] = 25 \geq 25$ β stop
So target is in block $(2..4]$
[ 1 ][ 4 ] | [ 9 ][16 ] | [25 ][36 ] | [49 ]
^ ^ ^ ^
probe=0 probe=2 probe=4 probe=6
Linear Scan in block (indexes 3..4)
- i = 3 β $A[3] = 16 < 25$
- i = 4 β $A[4] = 25 = 25$ β FOUND
Block [16 ][25 ]
^ ^
i=3 i=4 (found!)
The element $25$ is found at index 4.
- Works on sorted arrays; pick jump β βn for good balance.
- Time: $O(βn)$ comparisons on average; Space: $O(1)$.
- Useful when random access is cheap but full binary search isnβt desirable (e.g., limited CPU branch prediction, or when scanning in blocks is cache-friendly).
- Degrades gracefully to βscan block then stop.β
Exponential Search
On a sorted array, grow the right boundary exponentially (1, 2, 4, 8, β¦) to find a containing range, then finish with binary search in that range.
Example inputs and outputs
Example 1
$$ \text{Input: } A = [2, 3, 5, 7, 11, 13, 17, 19, 23], \quad \text{target} = 19 $$
$$ \text{Output: } \text{index} = 7 $$
Example 2
$$ \text{Input: } A = [10, 20, 30, 40, 50], \quad \text{target} = 12 $$
$$ \text{Output: } \text{not found} $$
How it works
A = [ 2 ][ 3 ][ 5 ][ 7 ][ 11 ][ 13 ][ 17 ][ 19 ][ 23 ]
i = 0 1 2 3 4 5 6 7 8
Target = 19
Find range by exponential jumps
Start at i=1
, double each step until A[i] β₯ target
(or end).
Jump 1: i=1
A[i]=3 β€ 19 β continue
A = [ 2 ][ 3 ][ 5 ][ 7 ][ 11 ][ 13 ][ 17 ][ 19 ][ 23 ]
i = 0 1 2 3 4 5 6 7 8
β
Jump 2: i=2
A[i]=5 β€ 19 β continue
A = [ 2 ][ 3 ][ 5 ][ 7 ][ 11 ][ 13 ][ 17 ][ 19 ][ 23 ]
i = 0 1 2 3 4 5 6 7 8
β
Jump 3: i=4
A[i]=11 β€ 19 β continue
A = [ 2 ][ 3 ][ 5 ][ 7 ][ 11 ][ 13 ][ 17 ][ 19 ][ 23 ]
i = 0 1 2 3 4 5 6 7 8
β
Jump 4: i=8
A[i]=23 > 19 β stop
Range is (previous power of two .. i] = (4 .. 8] β search indices 5..8
A = [ 2 ][ 3 ][ 5 ][ 7 ][ 11 ][ 13 ][ 17 ][ 19 ][ 23 ]
i = 0 1 2 3 4 5 6 7 8
β
Range for binary search: low=5, high=8
.
Binary search on $A[5..8]$
Subarray: [ 13 ][ 17 ][ 19 ][ 23 ]
Indices : 5 6 7 8
Step 1
low=5, high=8 β mid=(5+8)//2=6
A[6]=17 < 19 β move right β low=7
A = [ 2 ][ 3 ][ 5 ][ 7 ][ 11 ][ 13 ][ 17 ][ 19 ][ 23 ]
i = 0 1 2 3 4 5 6 7 8
βL βM βH
5 6 8
Step 2
low=7, high=8 β mid=(7+8)//2=7
A[7]=19 == target β
β FOUND
A = [ 2 ][ 3 ][ 5 ][ 7 ][ 11 ][ 13 ][ 17 ][ 19 ][ 23 ]
i = 0 1 2 3 4 5 6 7 8
βL M H
7
Found at index 7.
- Great when the target is likely to be near the beginning or when the array is unbounded/stream-like but sorted (you can probe indices safely).
- Time: $O(log p)$ to find the range where p is the final bound, plus $O(log p)$ for binary search β overall $O(log p)$.
- Space: $O(1)$.
- Often paired with data sources where you can test βis index i valid?β while doubling i.
Interpolation Search
On a sorted (roughly uniformly distributed) array, estimate the likely position using the values themselves and probe there; repeat on the narrowed side.
Example inputs and outputs
Example 1
$$ \text{Input: } A = [10, 20, 30, 40, 50, 60, 70], \quad \text{target} = 55 $$
$$ \text{Output: } \text{not found } (\text{probes near indices } 4\text{β}5) $$
Example 2
$$ \text{Input: } A = [5, 15, 25, 35, 45, 55, 65], \quad \text{target} = 45 $$
$$ \text{Output: } \text{index} = 4 $$
Example 3
$$ \text{Input: } A = [1, 1000, 1001, 1002], \quad \text{target} = 2 $$
$$ \text{Output: } \text{not found } (\text{bad distribution for interpolation}) $$
How it works
- Guard against division by zero: if
A[high] == A[low]
, stop (or binary-search fallback). - Clamp the computed
pos
to[low, high]
before probing. - Works best when values are uniformly distributed; otherwise it can degrade toward linear time.
- Assumes
A
is sorted and values are uniform.
Probe formula:
pos β low + (high - low) * (target - A[low]) / (A[high] - A[low])
Let say we have following array and target:
A = [ 10 ][ 20 ][ 30 ][ 40 ][ 50 ][ 60 ][ 70 ]
i = 0 1 2 3 4 5 6
target = 45
Step 1 β initial probe
low=0 (A[0]=10), high=6 (A[6]=70)
pos β 0 + (6-0) * (45-10)/(70-10)
β 6 * 35/60
β 3.5 β probe near 3.5
A = [ 10 ][ 20 ][ 30 ][ 40 ][ 50 ][ 60 ][ 70 ]
i = 0 1 2 3 4 5 6
βL βH
βposβ3.5 β choose βposβ=3 (or βposβ=4)
Probe index 3: A[3]=40 < 45
β set low = 3 + 1 = 4
Step 2 β after moving low
A = [ 10 ][ 20 ][ 30 ][ 40 ][ 50 ][ 60 ][ 70 ]
i = 0 1 2 3 4 5 6
βL βH
At this point, an early-stop check already tells us target (45) < A[low] (50)
β cannot exist in A[4..6]
β not found.
- Best on uniformly distributed sorted data; expected time $O(log log n)$.
- Worst case can degrade to $O(n)$, especially on skewed or clustered values.
- Space: $O(1)$.
- Very fast when value-to-index mapping is close to linear (e.g., near-uniform numeric keys).
- Requires careful handling when A[high] = A[low] (avoid division by zero); also sensitive to integer rounding in discrete arrays.
Hash-based Search
- Separate chaining: Easiest deletions, steady $O(1)$ with Ξ±β1; good when memory fragmentation isnβt a concern.
- Open addressing (double hashing): Best probe quality among OA variants; great cache locality; keep Ξ± < 0.8.
- Open addressing (linear/quadratic): Simple and fast at low Ξ±; watch clustering and tombstones.
- Cuckoo hashing: Tiny and predictable lookup cost; inserts costlier and may rehash; great for read-heavy workloads.
- In all cases: pick strong hash functions and resize early to keep Ξ± healthy.
Hash Table Search
Map a key to an array index with a hash function; look at that bucket to find the key, giving expected $O(1)$ lookups under a good hash and healthy load factor.
Example inputs and outputs
Example 1
$$ \text{Table size: } m = 7, \quad \text{Keys stored: } {10, 24, 31}, \quad \text{Target: } 24 $$
$$ \text{Output: } \text{found (bucket 3)} $$
Example 2
$$ \text{Table size: } m = 7, \quad \text{Keys stored: } {10, 24, 31}, \quad \text{Target: } 18 $$
$$ \text{Output: } \text{not found} $$
How it works
+-----+ hash +-----------------+ search/compare +--------+
| key | --------------> | index in array | ----------------------> | match? |
+-----+ +-----------------+ +--------+
- With chaining, the βcollision pathβ is the list inside one bucket.
- With linear probing, the βcollision pathβ is the probe sequence across buckets (3 β 4 β 5 β β¦).
- Both keep your original flow: hash β inspect bucket (and collision path) β match?
Array (buckets/indexes 0..6):
Idx: 0 1 2 3 4 5 6
+---+-----+-----+-----+-----+-----+-----+
| | | | | | | |
+---+-----+-----+-----+-----+-----+-----+
Example mapping with h(k) = k mod 7
, stored keys {10, 24, 31}
all hash to index 3
.
Strategy A β Separate Chaining (linked list per bucket)
Insertions
10 -> 3
24 -> 3 (collides with 10; append to bucket[3] list)
31 -> 3 (collides again; append to bucket[3] list)
Idx: 0 1 2 3 4 5 6
+---+-----+-----+-----+-----+-----+-----+
| | | | β’ | | | |
+---+-----+-----+-----+-----+-----+-----+
bucket[3] chain: [10] β [24] β [31] β β
Search(24)
1) Compute index = h(24) = 3
2) Inspect bucket 3's chain:
[10] β [24] β [31]
β found here
3) Return FOUND (bucket 3)
Strategy B β Open Addressing (Linear Probing)
Insertions
10 -> 3 place at 3
24 -> 3 (occupied) β probe 4 β place at 4
31 -> 3 (occ) β 4 (occ) β probe 5 β place at 5
Idx: 0 1 2 3 4 5 6
+---+-----+-----+-----+-----+-----+-----+
| | | | 10 | 24 | 31 | |
+---+-----+-----+-----+-----+-----+-----+
Search(24)
1) Compute index = h(24) = 3
2) Probe sequence:
3: 10 β 24 β continue
4: 24 = target β FOUND at index 4
(If not found, continue probing until an empty slot or wrap limit.)
- Quality hash + low load factor (Ξ± = n/m) β expected $O(1)$ search/insert/delete.
- Collisions are inevitable; the collision strategy (open addressing vs. chaining vs. cuckoo) dictates actual steps.
- Rehashing (growing and re-inserting) is used to keep Ξ± under control.
- Uniform hashing assumption underpins the $O(1)$ expectation; adversarial keys or poor hashes can degrade performance.
Open Addressing β Linear Probing
Keep everything in one array; on collision, probe alternative positions in a deterministic sequence until an empty slot or the key is found.
Example inputs and outputs
Example 1
$$ m = 10, \quad \text{Stored keys: } {12, 22, 32}, \quad \text{Target: } 22 $$
$$ \text{Output: } \text{found (index 3)} $$
Example 2
$$ m = 10, \quad \text{Stored keys: } {12, 22, 32}, \quad \text{Target: } 42 $$
$$ \text{Output: } \text{not found} $$
How it works
Hash function:
h(k) = k mod 10
Probe sequence: i, i+1, i+2, ... (wrap around)
Insertions
- Insert 12 β
h(12)=2
β place at index 2 - Insert 22 β
h(22)=2
occupied β probe 3 β place at 3 - Insert 32 β
h(32)=2
occupied β probe 3 (occupied) β probe 4 β place at 4
Resulting table (indexes 0..9):
Index: 0 1 2 3 4 5 6 7 8 9
+---+---+----+----+----+---+---+---+---+---+
Value: | | | 12 | 22 | 32 | | | | | |
+---+---+----+----+----+---+---+---+---+---+
Search(22)
- Start at
h(22)=2
- index 2 β 12 β 22 β probe β
- index 3 β 22 β FOUND
Path followed:
2 β 3
Search(42)
- Start at
h(42)=2
- index 2 β 12 β 42 β probe β
- index 3 β 22 β 42 β probe β
- index 4 β 32 β 42 β probe β
- index 5 β empty slot β stop β β NOT FOUND
Path followed:
2 β 3 β 4 β 5 (β
)
- Simple and cache-friendly; clusters form (βprimary clusteringβ) which can slow probes.
- Deletion uses tombstones to keep probe chains intact.
- Performance depends sharply on load factor; keep Ξ± well below 1 (e.g., Ξ± β€ 0.7).
- Expected search \~ $O(1)$ at low Ξ±; degrades as clusters grow.
Open Addressing β Quadratic Probing
Example inputs and outputs
Example 1
$$ m = 11 (\text{prime}), \quad \text{Stored keys: } {22, 33, 44}, \quad \text{Target: } 33 $$
$$ h(k) = k \bmod m, \quad h(33) = 33 \bmod 11 = 0 $$
$$ \text{Output: found (index 4)} $$
Example 2
$$ m = 11 (\text{prime}), \quad \text{Stored keys: } {22, 33, 44}, \quad \text{Target: } 55 $$
$$ h(55) = 55 \bmod 11 = 0 $$
$$ \text{Output: not found} $$
How it works
Hash function:
h(k) = k mod 11
Probe sequence (relative offsets):
+1Β², +2Β², +3Β², ... mod 11
= +1, +4, +9, +5, +3, +3Β²β¦ (wrapping around table size)
So from h(k)
, we try slots in this order:
h, h+1, h+4, h+9, h+5, h+3, ... (all mod 11)
Insertions
- Insert 22 β
h(22)=0
β place at index 0 - Insert 33 β
h(33)=0
occupied β try0+1Β²=1
β index 1 free β place at 1 - Insert 44 β
h(44)=0
occupied β probe 1 (occupied) β probe0+4=4
β place at 4
Resulting table:
Idx: 0 1 2 3 4 5 6 7 8 9 10
+----+----+---+---+----+---+--+--+--+---+
Val: | 22 | 33 | | | 44 | | | | | | |
+----+----+---+---+----+---+--+--+--+--+---+
Search(33)
- Start
h(33)=0
β slot 0 = 22 β 33 - Probe
0+1Β²=1
β slot 1 = 33 β FOUND
Path:
0 β 1
Search(55)
- Start
h(55)=0
β slot 0 = 22 β 55 - Probe
0+1Β²=1
β slot 1 = 33 β 55 - Probe
0+2Β²=4
β slot 4 = 44 β 55 - Probe
0+3Β²=9
β slot 9 = empty β stop β β NOT FOUND
Path:
0 β 1 β 4 β 9 (β
)
- Reduces primary clustering but can exhibit secondary clustering (keys with same h(k) follow same probe squares).
- Table size choice matters (often prime); ensure the probe sequence can reach many slots.
- Keep Ξ± modest; deletion still needs tombstones.
- Expected $O(1)$ at healthy Ξ±; simpler than double hashing.
Open Addressing β Double Hashing
Example inputs and outputs
Hash functions
$$ h_{1}(k) = k \bmod 11, \quad h_{2}(k) = 1 + (k \bmod 10) $$
Probing sequence:
$$ h(k,i) = \big(h_{1}(k) + i \cdot h_{2}(k)\big) \bmod 11 $$
Example 1
$$ m = 11, \quad \text{Stored keys: } {22, 33, 44}, \quad \text{Target: } 33 $$
For $k = 33$:
$$ h_{1}(33) = 33 \bmod 11 = 0, \quad h_{2}(33) = 1 + (33 \bmod 10) = 1 + 3 = 4 $$
So probe sequence is
$$ h(33,0) = 0, h(33,1) = (0 + 1\cdot 4) \bmod 11 = 4, h(33,2) = (0 + 2\cdot 4) \bmod 11 = 8, \dots $$
Since the stored layout places $33$ at index $4$, the search succeeds.
$$ \text{Output: found (index 4)} $$
Example 2
$$ m = 11, \quad \text{Stored keys: } {22, 33, 44}, \quad \text{Target: } 55 $$
For $k = 55$:
$$ h_{1}(55) = 55 \bmod 11 = 0, \quad h_{2}(55) = 1 + (55 \bmod 10) = 1 + 5 = 6 $$
Probing sequence:
$$ 0, (0+6)\bmod 11 = 6, (0+2\cdot 6)\bmod 11 = 1, (0+3\cdot 6)\bmod 11 = 7, \dots $$
No slot matches $55$.
$$ \text{Output: not found} $$
How it works
We use two hash functions:
hβ(k) = k mod m
hβ(k) = 1 + (k mod 10)
Probe sequence:
i, i + hβ, i + 2Β·hβ, i + 3Β·hβ, ... (all mod m)
This ensures fewer clustering issues compared to linear or quadratic probing.
Insertions (m = 11)
Insert 22
hβ(22)=0
β place at index 0
Insert 33
hβ(33)=0
(occupied)hβ(33)=1+(33 mod 10)=4
- Probe sequence: 0, 4 β place at index 4
Insert 44
hβ(44)=0
(occupied)hβ(44)=1+(44 mod 10)=5
- Probe sequence: 0, 5 β place at index 5
Table State
Idx: 0 1 2 3 4 5 6 7 8 9 10
+---+---+---+---+---+---+---+---+---+---+
Val: |22 | | | |33 |44 | | | | | |
+---+---+---+---+---+---+---+---+---+---+---+
Search(33)
- Start at
hβ(33)=0
β slot 0 = 22 β 33 - Next:
0+1Β·hβ(33)=0+4=4
β slot 4 = 33 β FOUND
Path:
0 β 4
Search(55)
hβ(55)=0
,hβ(55)=1+(55 mod 10)=6
- slot 0 = 22 β 55
- slot 6 = empty β stop β β NOT FOUND
Path:
0 β 6 (β
)
- Minimizes clustering; probe steps depend on the key.
- Choose hβ so itβs non-zero and relatively prime to m, ensuring a full cycle.
- Excellent performance at higher Ξ± than linear/quadratic, but still sensitive if Ξ± β 1.
- Deletion needs tombstones; implementation slightly more complex.
Separate Chaining
Each array cell holds a small container (e.g., a linked list); colliding keys live together in that bucket.
Example inputs and outputs
Setup
$$ m = 5, \quad h(k) = k \bmod 5, \quad \text{buckets hold linked lists} $$
Keys stored:
$$ {12, 22, 7, 3, 14} $$
Bucket contents after hashing:
$$ \begin{aligned} h(12) &= 12 \bmod 5 = 2 \Rightarrow& \text{bucket 2: } [12] [6pt] h(22) &= 22 \bmod 5 = 2 \Rightarrow& \text{bucket 2: } [12, 22] [6pt] h(7) &= 7 \bmod 5 = 2 \Rightarrow& \text{bucket 2: } [12, 22, 7] [6pt] h(3) &= 3 \bmod 5 = 3 \Rightarrow& \text{bucket 3: } [3] [6pt] h(14) &= 14 \bmod 5 = 4 \Rightarrow& \text{bucket 4: } [14] \end{aligned} $$
Example 1
$$ \text{Target: } 22 $$
$$ h(22) = 2 \Rightarrow \text{bucket 2} = [12, 22, 7] $$
Found at position 2 in the list.
$$ \text{Output: found (bucket 2, position 2)} $$
Example 2
$$ \text{Target: } 9 $$
$$ h(9) = 9 \bmod 5 = 4 \Rightarrow \text{bucket 4} = [14] $$
No match.
$$ \text{Output: not found} $$
How it works
h(k) = k mod 5
Buckets store small lists (linked lists or dynamic arrays)
Idx: 0 1 2 3 4
[ ] [ ] [ 12 β 22 β 7 ] [ 3 ] [ 14 ]
Search(22):
- Compute bucket b = h(22) = 2
- Linearly scan bucket 2 β find 22
Search(9):
- b = h(9) = 4
- Bucket 4: [14] β 9 not present β NOT FOUND
- Simple deletes (remove from a bucket) and no tombstones.
- Expected $O(1 + Ξ±)$ time; with good hashing and Ξ± kept near/below 1, bucket lengths stay tiny.
- Memory overhead for bucket nodes; cache locality worse than open addressing.
- Buckets can use ordered lists or small vectors to accelerate scans.
- Rehashing still needed as n grows; Ξ± = n/m controls performance.
Cuckoo Hashing
Keep two (or more) hash positions per key; insert by βkicking outβ occupants to their alternate home so lookups check only a couple of places.
Example inputs and outputs
Setup Two hash tables $T_{1}$ and $T_{2}$, each of size
$$ m = 5 $$
Two independent hash functions:
$$ h_{1}(k), \quad h_{2}(k) $$
Cuckoo hashing invariant:
- Each key is stored either in $T_{1}[h_{1}(k)]$ or $T_{2}[h_{2}(k)]$.
- On insertion, if a spot is occupied, the existing key is kicked out and reinserted into the other table.
- If relocations form a cycle, the table is rebuilt (rehash) with new hash functions.
Example 1
$$ \text{Target: } 15 $$
Lookup procedure:
- Check $T_{1}[h_{1}(15)]$.
- If not found, check $T_{2}[h_{2}(15)]$.
Result:
$$ \text{found in } T_{2} \text{ at index } 4 $$
$$ \text{Output: found (Tβ, index 4)} $$
Example 2
If insertion causes repeated displacements and eventually loops:
$$ \text{Cycle detected } \Rightarrow \text{rehash with new } h_{1}, h_{2} $$
$$ \text{Output: rebuild / rehash required} $$
How it works
We keep two hash tables (Tβ, Tβ), each with its own hash function. Every key can live in exactly one of two possible slots:
Hash functions:
$$ h_1(k) = k \bmod 5, \quad h_2(k) = 1 + (k \bmod 4) $$
Every key can live in exactly one of two slots: $T_1[h_1(k)]$ or $T_2[h_2(k)]$. If a slot is occupied, we evict the old occupant and reinsert it at its alternate location.
Start empty:
Tβ: [ ][ ][ ][ ][ ]
Tβ: [ ][ ][ ][ ][ ]
Insert 10 β goes to $T_1[h_1(10)=0]$:
Tβ: [10 ][ ][ ][ ][ ]
Tβ: [ ][ ][ ][ ][ ]
Insert 15
- $T_1[0]$ already has 10 β evict 10
- Place 15 at $T_1[0]$
- Reinsert evicted 10 at $T_2[h_2(10)=3]$:
Tβ: [15 ][ ][ ][ ][ ]
Tβ: [ ][ ][ ][10 ][ ]
Insert 20
- $T_1[0]$ has 15 β evict 15
- Place 20 at $T_1[0]$
- Reinsert 15 at $T_2[h_2(15)=4]$:
Tβ: [20 ][ ][ ][ ][ ]
Tβ: [ ][ ][ ][10 ][15 ]
Insert 25
- $T_1[0]$ has 20 β evict 20
- Place 25 at $T_1[0]$
- Reinsert 20 at $T_2[h_2(20)=1]$:
Tβ: [25 ][ ][ ][ ][ ]
Tβ: [ ][20 ][ ][10 ][15 ]
π Search(15)
- $T_1[h_1(15)=0] \to 25 \neq 15$
- $T_2[h_2(15)=4] \to 15$ β FOUND
FOUND in Tβ at index 4
- Lookups probe at most two places (with two hashes) β excellent constant factors.
- Inserts may trigger a chain of evictions; detect cycles and rehash with new functions.
- High load factors achievable (e.g., \~0.5β0.9 depending on variant and number of hashes/tables).
- Deletions are easy (remove key); no tombstones, but ensure invariants remain.
- Sensitive to hash quality; poor hashes increase cycle risk.
Probabilistic & Approximate Search
Bloom Filter
Space-efficient structure for fast membership tests; answers βmaybe presentβ or βdefinitely not presentβ with a tunable false-positive rate and no false negatives (if built correctly, without deletions).
Example inputs and outputs
Setup
$$ m = 16 \text{bits}, \quad k = 3 \text{hash functions } (h_{1}, h_{2}, h_{3}) $$
Inserted set:
$$ {"cat", "dog"} $$
Example 1
$$ \text{Query: contains("cat")} $$
All $h_{i}(\text{"cat"})$ bits are set β actual member.
$$ \text{Output: maybe present (true positive)} $$
Example 2
$$ \text{Query: contains("cow")} $$
One probed bit = 0 β cannot be present.
$$ \text{Output: definitely not present} $$
Example 3
$$ \text{Query: contains("eel")} $$
All $h_{i}(\text{"eel"})$ bits happen to be set, even though "eel" was never inserted.
$$ \text{Output: maybe present (false positive)} $$
How it works
Initial state (all zeros):
Idx: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
A = [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
Insert "cat"
h1(cat) = 3, h2(cat) = 7, h3(cat) = 12
β Set bits at 3, 7, 12
Idx: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
A = [0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0]
^ ^ ^
3 7 12
Insert "dog"
h1(dog) = 1, h2(dog) = 7, h3(dog) = 9
β Set bits at 1, 7, 9 (7 already set)
Idx: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
A = [0 1 0 1 0 0 0 1 0 1 0 0 1 0 0 0]
^ ^ ^
1 7 9
Query "cow"
h1(cow) = 1 β bit[1] = 1
h2(cow) = 3 β bit[3] = 1
h3(cow) = 6 β bit[6] = 0 β
Idx: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
A = [0 1 0 1 0 0 0 1 0 1 0 0 1 0 0 0]
β β β
At least one zero β DEFINITELY NOT PRESENT
Query "eel"
h1(eel) = 7 β bit[7] = 1
h2(eel) = 9 β bit[9] = 1
h3(eel) = 12 β bit[12] = 1
Idx: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
A = [0 1 0 1 0 0 0 1 0 1 0 0 1 0 0 0]
β β β
All ones β MAYBE PRESENT (could be a false positive)
- Answers: maybe present / definitely not present; never false negatives (without deletions).
- False-positive rate is tunable via bit-array size m, number of hashes k, and items n; more space & good k β lower FPR.
- Time: $O(k)$ per insert/lookup; Space: \~m bits.
- No deletions in the basic form; duplicates are harmless (idempotent sets).
- Union = bitwise OR; intersection = bitwise AND (for same m,k,hashes).
- Choose independent, well-mixed hash functions to avoid correlated bits.
Counting Bloom Filter
Bloom filter variant that keeps a small counter per bit so you can delete by decrementing; still probabilistic and may have false positives.
Example inputs and outputs
Setup
$$ m = 12 \text{counters (each 2β4 bits)}, \quad k = 3 \text{hash functions} $$
Inserted set:
$$ {\text{"alpha"}, \text{"beta"}} $$
Then delete "alpha"
.
Example 1
$$ \text{Query: contains("alpha")} $$
Counters for "alpha"
decremented; at least one probed counter is now $0$.
$$ \text{Output: definitely not present} $$
Example 2
$$ \text{Query: contains("beta")} $$
All three counters for "beta"
remain $>0$.
$$ \text{Output: maybe present} $$
Example 3
$$ \text{Query: contains("gamma")} $$
At least one probed counter is $0$.
$$ \text{Output: definitely not present} $$
How it works
Each cell is a small counter (e.g. 4-bits, range 0..15). This allows deletions: increment on insert, decrement on delete.
Initial state
Idx: 0 1 2 3 4 5 6 7 8 9 10 11
A = [0 0 0 0 0 0 0 0 0 0 0 0]
Insert "alpha"
Hashes: {2, 5, 9}
β Increment those counters
Idx: 0 1 2 3 4 5 6 7 8 9 10 11
A = [0 0 1 0 0 1 0 0 0 1 0 0]
β β β
2 5 9
Insert "beta"
Hashes: {3, 5, 11}
β Increment those counters
Idx: 0 1 2 3 4 5 6 7 8 9 10 11
A = [0 0 1 1 0 2 0 0 0 1 0 1]
β β β
3 5 11
Lookup "beta"
Hashes: {3, 5, 11}
Counters = {1, 2, 1} β all > 0
β Result: MAYBE PRESENT
Idx: 0 1 2 3 4 5 6 7 8 9 10 11
A = [0 0 1 1 0 2 0 0 0 1 0 1]
β β β
Delete "alpha"
Hashes: {2, 5, 9}
β Decrement those counters
Idx: 0 1 2 3 4 5 6 7 8 9 10 11
A = [0 0 0 1 0 1 0 0 0 0 0 1]
β β β
2 5 9
Lookup "alpha"
Hashes: {2, 5, 9}
Counters = {0, 1, 0}
β At least one zero
β Result: DEFINITELY NOT PRESENT
Idx: 0 1 2 3 4 5 6 7 8 9 10 11
A = [0 0 0 1 0 1 0 0 0 0 0 1]
β β β
- Supports deletion by decrementing counters; insertion increments.
- Still probabilistic: may return false positives; avoids false negatives if counters never underflow and hashes are consistent.
- Space: more than Bloom (a few bits per counter instead of 1).
- Watch for counter saturation (caps at max value) and underflow (donβt decrement below 0).
- Good for dynamic sets with frequent inserts and deletes.
Cuckoo Filter
Hash-tableβstyle filter that stores short fingerprints in two possible buckets; supports insert, lookup, delete with low false-positive rates and high load factors.
Example inputs and outputs
Setup
$$ b = 8 \text{buckets}, \quad \text{bucket size} = 2, \quad \text{fingerprint size} = 8 \text{bits} $$
Inserted set:
$$ {\text{"cat"}, \text{"dog"}, \text{"eel"}} $$
Each element is stored as a short fingerprint in one of two candidate buckets.
Example 1
$$ \text{Query: contains("cat")} $$
Fingerprint for "cat"
is present in one of its candidate buckets.
$$ \text{Output: maybe present (true positive)} $$
Example 2
$$ \text{Query: contains("fox")} $$
Fingerprint for "fox"
is absent from both candidate buckets.
$$ \text{Output: definitely not present} $$
Example 3 (Deletion)
$$ \text{Operation: remove("dog")} $$
Fingerprint for "dog"
is removed from its bucket.
$$ \text{Result: deletion supported directly by removing the fingerprint} $$
How it works
Each key x
β short fingerprint f = FP(x)
Two candidate buckets:
i1 = H(x) mod b
i2 = i1 XOR H(f) mod b
(f
can be stored in either bucket; moving between buckets preserves the invariant.)
Start (empty)
[0]: [ -- , -- ] [1]: [ -- , -- ] [2]: [ -- , -- ] [3]: [ -- , -- ]
[4]: [ -- , -- ] [5]: [ -- , -- ] [6]: [ -- , -- ] [7]: [ -- , -- ]
Insert "cat"
f = 0xA7
i1 = 1
i2 = 1 XOR H(0xA7) = 5
Bucket 1 has free slot β place 0xA7 in [1]
[1]: [ A7 , -- ]
Insert "dog"
f = 0x3C
i1 = 5
i2 = 5 XOR H(0x3C) = 2
Bucket 5 has free slot β place 0x3C in [5]
[1]: [ A7 , -- ] [5]: [ 3C , -- ]
Insert "eel"
f = 0xD2
i1 = 1
i2 = 1 XOR H(0xD2) = 4
Bucket 1 has one free slot β place 0xD2 in [1]
[1]: [ A7 , D2 ] [5]: [ 3C , -- ]
Lookup "cat"
f = 0xA7
Buckets: i1 = 1, i2 = 5
Check: bucket[1] has A7 β found
Result: MAYBE PRESENT
Lookup "fox"
f = 0x9B
i1 = 0
i2 = 0 XOR H(0x9B) = 7
Check buckets 0 and 7 β fingerprint not found
Result: DEFINITELY NOT PRESENT
- Stores fingerprints, not full keys; answers maybe present / definitely not present.
- Supports deletion by removing a matching fingerprint from either bucket.
- Very high load factors (often 90%+ with small buckets) and excellent cache locality.
- False-positive rate controlled by fingerprint length (more bits β lower FPR).
- Insertions can trigger eviction chains; worst case requires a rehash/resize.
- Two buckets per item (or more in variants); lookups check a tiny, fixed set of places.
String Search Algorithms
- KMP: Best all-rounder for guaranteed $O(n + m)$ and tiny memory.
- BoyerβMoore: Fastest in practice on long patterns / large alphabets due to big skips.
- RabinβKarp: Great for many patterns or streaming; hashing enables batched checks.
- Naive: Fine for tiny inputs or as a baseline; simplest to reason about.
Naive String Search
Slide the pattern one position at a time over the text; at each shift compare characters left-to-right until a mismatch or a full match.
Example inputs and outputs
Example 1
$$ \text{Text: } "abracadabra", \quad \text{Pattern: } "abra" $$
$$ \text{Output: matches at indices } 0 \text{and} 7 $$
Example 2
$$ \text{Text: } "aaaaa", \quad \text{Pattern: } "aaa" $$
$$ \text{Output: matches at indices } 0, 1, 2 $$
How it works
Text (length 11):
Text: a b r a c a d a b r a
Idx: 0 1 2 3 4 5 6 7 8 9 10
Pattern (length 4):
Pattern: a b r a
Shift 0
Text: a b r a
Pattern: a b r a
β All match β REPORT at index 0
Shift 1
Text: b r a c
Pattern: a b r a
β Mismatch at first char β advance
Shift 2
Text: r a c a
Pattern: a b r a
β Mismatch β advance
Shift 3
Text: a c a d
Pattern: a b r a
β Mismatch β advance
Shift 4
Text: c a d a
Pattern: a b r a
β Mismatch β advance
Shift 5
Text: a d a b
Pattern: a b r a
β Mismatch β advance
Shift 6
Text: d a b r
Pattern: a b r a
β Mismatch β advance
Shift 7
Text: a b r a
Pattern: a b r a
β All match β REPORT at index 7
- Works anywhere; no preprocessing.
- Time: worst/average $O(nΒ·m)$ (text length n, pattern length m).
- Space: $O(1)$.
- Good for very short patterns or tiny inputs; otherwise use KMP/BM/RK.
KnuthβMorrisβPratt (KMP)
Precompute a table (LPS / prefix-function) for the pattern so that on a mismatch you βjumpβ the pattern to the longest proper prefix that is also a suffix, avoiding rechecks.
Example inputs and outputs
Example 1
$$ \text{Text: } "ababcabcabababd", \quad \text{Pattern: } "ababd" $$
$$ \text{Output: match at index } 10 $$
Example 2
$$ \text{Text: } "aaaaab", \quad \text{Pattern: } "aaab" $$
$$ \text{Output: match at index } 2 $$
How it works
We want to find the pattern "ababd"
in the text "ababcabca babab d"
.
1) Precompute LPS (Longest Proper Prefix that is also a Suffix)
Pattern:
a b a b d
0 1 2 3 4 β index
LPS array:
0 0 1 2 0
Meaning:
- At each position, how many chars can we βfall backβ within the pattern itself if a mismatch happens.
- Example: at index 3 (pattern
"abab"
), LPS=2 means if mismatch occurs, restart comparison from"ab"
inside the pattern.
2) Scan Text with Two Pointers
i
= text indexj
= pattern index
Text:
a b a b c a b c a b a b a b d
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Pattern: a b a b d
Step A: Initial matches
i=0..3: "abab" matched β j=4 points to 'd'
Step B: Mismatch at i=4
text[i=4] = 'c'
pattern[j=4] = 'd' β mismatch
Instead of restarting, use LPS:
j = LPS[j-1] = LPS[3] = 2
So pattern jumps back to "ab"
(no wasted text comparisons).
i stays at 4.
Step C: Continue scanning
The algorithm keeps moving forward, reusing LPS whenever mismatches occur.
Step D: Full match found
At i=14
, j advances to 5 (pattern length).
β FULL MATCH found!
Start index = i - m + 1 = 14 - 5 + 1 = 10
β
Pattern "ababd"
occurs in the text starting at index 10.
- Time: $O(n + m)$ (preprocessing + scan).
- Space: $O(m)$ for LPS table.
- Never moves i backward; avoids redundant comparisons.
- Ideal for repeated searches with the same pattern.
- LPS is also called prefix-function / failure-function.
BoyerβMoore (BM)
Compare the pattern right-to-left; on a mismatch, skip ahead using bad-character and good-suffix rules so many text characters are never touched.
Example inputs and outputs
Example 1
$$ \text{Text: } "HERE IS A SIMPLE EXAMPLE", \quad \text{Pattern: } "EXAMPLE" $$
$$ \text{Output: match at index } 17 $$
Example 2
$$ \text{Text: } "NEEDLE IN A HAYSTACK", \quad \text{Pattern: } "STACK" $$
$$ \text{Output: match at index } 15 $$
How it works
- Align the pattern under the text.
- Compare right β left.
- On mismatch, shift the pattern by the max of:
- Bad-character rule: align the mismatched text char with its last occurrence in the pattern (or skip it if absent).
- Good-suffix rule: if a suffix matched, align another occurrence of it (or a prefix).
Text (with spaces shown as _
):
0 10 20 30
H E R E _ I S _ A _ S I M P L E _ E X A M P L E
Pattern: "EXAMPLE"
(length = 7)
Step 1: Align pattern at text[10..16] = "SIMPLE"
Text: ... S I M P L E ...
Pattern: E X A M P L E
β (start comparing right β left)
Compare right-to-left:
E=E, L=L, P=P, M=M, A=A,
X vs I β mismatch
- Bad character =
I
(from text). - Does pattern contain
I
? β β no. - β Shift pattern past
I
.
Step 2: Shift until pattern under "EXAMPLE"
Text: ... E X A M P L E
Pattern: E X A M P L E
Compare right-to-left:
E=E, L=L, P=P, M=M, A=A, X=X, E=E
β Full match found at index 17.
- Average case sublinear (often skips large chunks of text).
- Worst case can be $O(nΒ·m)$; with both rules + Galilβs optimization, comparisons can be bounded $O(n + m)$.
- Space: $O(Ο + m)$ for tables (Ο = alphabet size).
- Shines on long patterns over large alphabets (e.g., English text, logs).
- Careful table prep (bad-character & good-suffix) is crucial.
RabinβKarp (RK)
Compare rolling hashes of the current text window and the pattern; only if hashes match do a direct character check (to rule out collisions).
Example inputs and outputs
Example 1
$$ \text{Text: } "ABCDABCABCD", \quad \text{Pattern: } "ABC" $$
$$ \text{Output: matches at indices } 0, 4, 7 $$
Example 2
$$ \text{Text: } "ABCDE", \quad \text{Pattern: } "FG" $$
$$ \text{Output: no match} $$
How it works
Weβll use the classic choices:
- Base B = 256
- Modulus M = 101 (prime)
- Character value
val(c) = ASCII(c)
(e.g.,A=65, B=66, C=67, D=68
) - Pattern P = "ABC" (length m = 3)
- Text T = "ABCDABCABCD" (length 11)
Text: A B C D A B C A B C D
Index: 0 1 2 3 4 5 6 7 8 9 10
Pattern: A B C (m = 3)
Precompute
pow = B^(m-1) mod M = 256^2 mod 101 = 88
HP = H(P) = H("ABC")
Start h=0
, then for each char h = (B*h + val) mod M
:
- After 'A':
(256*0 + 65) % 101 = 65
- After 'B':
(256*65 + 66) % 101 = 41
- After 'C':
(256*41 + 67) % 101 = 59
So HP = 59.
Rolling all windows
Initial window T[0..2]="ABC"
:
h0 = 59
(matches HP β verify chars β β
match at 0)
For rolling:
h_next = ( B * (h_curr β val(left) * pow) + val(new) ) mod M
(If the inner term is negative, add M
before multiplying.)
First two rolls
From $[0..2]$ "ABC" $(h0=59)$ β $[1..3]$ "BCD":
left='A'(65), new='D'(68)
inner = (59 β 65*88) mod 101 = (59 β 5720) mod 101 = 96
h1 = (256*96 + 68) mod 101 = 24644 mod 101 = 0
From $[3..5]$ "DAB" $(h3=66)$ β $[4..6]$ "ABC":
left='D'(68), new='C'(67)
inner = (66 β 68*88) mod 101 = (66 β 5984) mod 101 = 41
h4 = (256*41 + 67) mod 101 = 10563 mod 101 = 59 (= HP)
All windows (start index s)
s window hash =HP?
0 ABC 59 β β verify β β
MATCH at 0
1 BCD 0
2 CDA 38
3 DAB 66
4 ABC 59 β β verify β β
MATCH at 4
5 BCA 98
6 CAB 79
7 ABC 59 β β verify β β
MATCH at 7
8 BCD 0
Matches at indices: 0, 4, 7.
- Expected time $O(n + m)$ with a good modulus and low collision rate; worst case $O(nΒ·m)$ if many collisions.
- Space: $O(1)$ beyond the text/pattern and precomputed powers.
- Excellent for multi-pattern search (compute many pattern hashes, reuse rolling windows).
- Choose modulus to reduce collisions; verify on hash hits to ensure correctness.
- Works naturally on streams/very large texts since it needs only the current window.