Last modified: September 13, 2025
This article is written in: πΊπΈ
Searching refers to the process of finding the location of a specific element within a collection of data, such as an array, list, tree, or graph. It underpins many applications, from databases and information retrieval to routing and artificial intelligence. Depending on the organization of the data, different search techniques are usedβsuch as linear search for unsorted data, binary search for sorted data, and more advanced approaches like hash-based lookup or tree traversals for hierarchical structures. Efficient searching is important because it directly impacts the performance and scalability of software systems.
Scan the list from left to right, comparing the target with each element until you either find a match (return its index) or finish the list (report βnot foundβ).
Example inputs and outputs
Example 1
$$ \text{Input: } [7, 3, 5, 2, 9], \quad \text{target} = 5 $$
$$ \text{Output: } \text{index} = 2 $$
Example 2
$$ \text{Input: } [4, 4, 4], \quad \text{target} = 4 $$
$$ \text{Output: } \text{index} = 0 (\text{first match}) $$
Example 3
$$ \text{Input: } [10, 20, 30], \quad \text{target} = 25 $$
$$ \text{Output: } \text{not found} $$
How Linear Search Works
We start at index 0
, compare the value with the target, and keep moving right until we either find it or reach the end.
Target 5 in [7, 3, 5, 2, 9]
Indexes: 0 1 2 3 4
List: [7] [3] [5] [2] [9]
Target: 5
Step 1: pointer at index 0
|
v
7 3 5 2 9
β compare 7 vs 5 β no
Step 2: pointer moves to index 1
|
v
7 3 5 2 9
β compare 3 vs 5 β no
Step 3: pointer moves to index 2
|
v
7 3 5 2 9
β compare 5 vs 5 β YES β
β return index 2
Worst Case (Not Found)
Target 9 in [1, 2, 3]
Indexes: 0 1 2
List: [1] [2] [3]
Target: 9
Checks:
β 1 β 9
β 2 β 9
β 3 β 9
β end
β not found β
Place one copy of the target at the very end as a βsentinelβ so the scan can run without checking bounds each step; afterward, decide whether the match was inside the original list or only at the sentinel position.
Example inputs and outputs
Example 1
$$ \text{Input: } [12, 8, 6, 15], \quad \text{target} = 6 $$
$$ \text{Output: } \text{index} = 2 $$
Example 2
$$ \text{Input: } [2, 4, 6, 8], \quad \text{target} = 5 $$
$$ \text{Output: } \text{not found } (\text{only the sentinel matched}) $$
How it works
Put the target at one extra slot at the end so the loop is guaranteed to stop on a match; afterward, check whether the match was inside the original range.
Target 11 not in the list
Original list (n=5):
[ 4 ][ 9 ][ 1 ][ 7 ][ 6 ]
Target: 11
Add sentinel (extra slot):
[ 4 ][ 9 ][ 1 ][ 7 ][ 6 ][ 11 ]
0 1 2 3 4 5 β sentinel
Scan step by step:
4 β 11 β pointer at 0
9 β 11 β pointer at 1
1 β 11 β pointer at 2
7 β 11 β pointer at 3
6 β 11 β pointer at 4
11 = 11 β pointer at 5 (sentinel)
Therefore, not found in original list.
Target 6 inside the list
Original list (n=4):
[ 12 ][ 8 ][ 6 ][ 15 ]
Target: 6
Add sentinel:
[ 12 ][ 8 ][ 6 ][ 15 ][ 6 ]
0 1 2 3 4
Scan:
12 β 6 β index 0
8 β 6 β index 1
6 = 6 β index 2 β
On a sorted array, repeatedly halve the search interval by comparing the target to the middle element until found or the interval is empty.
Example inputs and outputs
Example 1
$$ \text{Input: } A = [2, 5, 8, 12, 16, 23, 38], \quad \text{target} = 16 $$
$$ \text{Output: } \text{index} = 4 $$
Example 2
$$ \text{Input: } A = [1, 3, 3, 3, 9], \quad \text{target} = 3 $$
$$ \text{Output: } \text{index} = 2 \quad (\text{any valid match; first/last needs a variant}) $$
Example 3
$$ \text{Input: } A = [10, 20, 30, 40], \quad \text{target} = 35 $$
$$ \text{Output: } \text{not found} $$
How it works
We repeatedly check the middle element, and then discard half the list based on comparison.
Find 16 in:
A = [ 2 ][ 5 ][ 8 ][ 12 ][ 16 ][ 23 ][ 38 ]
i = 0 1 2 3 4 5 6
Step 1
low = 0, high = 6
mid = (0+6)//2 = 3
A[3] = 12 < 16 β target is to the RIGHT β new low = mid + 1 = 4
A = [ 2 ][ 5 ][ 8 ][ 12 ][ 16 ][ 23 ][ 38 ]
i = 0 1 2 3 4 5 6
βL βM βH
0 3 6
Active range: indices 0..6
Step 2
low = 4, high = 6
mid = (4+6)//2 = 5
A[5] = 23 > 16 β target is to the LEFT β new high = mid - 1 = 4
A = [ 2 ][ 5 ][ 8 ][ 12 ][ 16 ][ 23 ][ 38 ]
i = 0 1 2 3 4 5 6
βL βM βH
4 5 6
Active range: indices 4..6
Step 3
low = 4, high = 4
mid = 4
A[4] = 16 == target β
A = [ 2 ][ 5 ][ 8 ][ 12 ][ 16 ][ 23 ][ 38 ]
i = 0 1 2 3 4 5 6
βLMH
4
Active range: indices 4..4
FOUND at index 4
Like binary, but splits the current interval into three parts using two midpoints; used mainly for unimodal functions or very specific array cases.
Example inputs and outputs
Example 1
$$ \text{Input: } A = [1, 4, 7, 9, 12, 15], \quad \text{target} = 9 $$
$$ \text{Output: } \text{index} = 3 $$
Example 2
$$ \text{Input: } A = [2, 6, 10, 14], \quad \text{target} = 5 $$
$$ \text{Output: } \text{not found} $$
How it works
We divide the array into three parts using two midpoints m1
and m2
.
target < A[m1]
β search $[low .. m1-1]$target > A[m2]
β search $[m2+1 .. high]$
A = [ 1 ][ 4 ][ 7 ][ 9 ][ 12 ][ 15 ]
i = 0 1 2 3 4 5
Target: 9
Step 1
low = 0, high = 5
m1 = low + (high - low)//3 = 0 + (5)//3 = 1
m2 = high - (high - low)//3 = 5 - (5)//3 = 3
A[m1] = 4
A[m2] = 9
A = [ 1 ][ 4 ][ 7 ][ 9 ][ 12 ][ 15 ]
i = 0 1 2 3 4 5
βL βm1 βm2 βH
0 1 3 5
FOUND at index 3
On a sorted array, jump ahead in fixed block sizes to find the block that may contain the target, then do a linear scan inside that block.
Example inputs and outputs
Example 1
$$ \text{Input: } A = [1, 4, 9, 16, 25, 36, 49], \quad \text{target} = 25, \quad \text{jump} = \lfloor \sqrt{7} \rfloor = 2 $$
$$ \text{Output: } \text{index} = 4 $$
Example 2
$$ \text{Input: } A = [3, 8, 15, 20, 22, 27], \quad \text{target} = 21, \quad \text{jump} = 2 $$
$$ \text{Output: } \text{not found} $$
How it works
Perfect β thatβs a jump search trace. Let me reformat and polish it so the steps are crystal clear and the βjump + linear scanβ pattern pops visually:
Weβre applying jump search to find $25$ in
$$ A = [1, 4, 9, 16, 25, 36, 49] $$
with $n=7$, block size $\approx \sqrt{7} \approx 2$, so jump=2.
We probe every 2nd index:
So target is in block $(2..4]$
[ 1 ][ 4 ] | [ 9 ][16 ] | [25 ][36 ] | [49 ]
^ ^ ^ ^
probe=0 probe=2 probe=4 probe=6
Linear Scan in block (indexes 3..4)
Block [16 ][25 ]
^ ^
i=3 i=4 (found!)
The element $25$ is found at index 4.
On a sorted array, grow the right boundary exponentially (1, 2, 4, 8, β¦) to find a containing range, then finish with binary search in that range.
Example inputs and outputs
Example 1
$$ \text{Input: } A = [2, 3, 5, 7, 11, 13, 17, 19, 23], \quad \text{target} = 19 $$
$$ \text{Output: } \text{index} = 7 $$
Example 2
$$ \text{Input: } A = [10, 20, 30, 40, 50], \quad \text{target} = 12 $$
$$ \text{Output: } \text{not found} $$
How it works
A = [ 2 ][ 3 ][ 5 ][ 7 ][ 11 ][ 13 ][ 17 ][ 19 ][ 23 ]
i = 0 1 2 3 4 5 6 7 8
Target = 19
Find range by exponential jumps
Start at i=1
, double each step until A[i] β₯ target
(or end).
Jump 1: i=1
A[i]=3 β€ 19 β continue
A = [ 2 ][ 3 ][ 5 ][ 7 ][ 11 ][ 13 ][ 17 ][ 19 ][ 23 ]
i = 0 1 2 3 4 5 6 7 8
β
Jump 2: i=2
A[i]=5 β€ 19 β continue
A = [ 2 ][ 3 ][ 5 ][ 7 ][ 11 ][ 13 ][ 17 ][ 19 ][ 23 ]
i = 0 1 2 3 4 5 6 7 8
β
Jump 3: i=4
A[i]=11 β€ 19 β continue
A = [ 2 ][ 3 ][ 5 ][ 7 ][ 11 ][ 13 ][ 17 ][ 19 ][ 23 ]
i = 0 1 2 3 4 5 6 7 8
β
Jump 4: i=8
A[i]=23 > 19 β stop
Range is (previous power of two .. i] = (4 .. 8] β search indices 5..8
A = [ 2 ][ 3 ][ 5 ][ 7 ][ 11 ][ 13 ][ 17 ][ 19 ][ 23 ]
i = 0 1 2 3 4 5 6 7 8
β
Range for binary search: low=5, high=8
.
Binary search on $A[5..8]$
Subarray: [ 13 ][ 17 ][ 19 ][ 23 ]
Indices : 5 6 7 8
Step 1
low=5, high=8 β mid=(5+8)//2=6
A[6]=17 < 19 β move right β low=7
A = [ 2 ][ 3 ][ 5 ][ 7 ][ 11 ][ 13 ][ 17 ][ 19 ][ 23 ]
i = 0 1 2 3 4 5 6 7 8
βL βM βH
5 6 8
Step 2
low=7, high=8 β mid=(7+8)//2=7
A[7]=19 == target β
β FOUND
A = [ 2 ][ 3 ][ 5 ][ 7 ][ 11 ][ 13 ][ 17 ][ 19 ][ 23 ]
i = 0 1 2 3 4 5 6 7 8
βL M H
7
Found at index 7.
On a sorted (roughly uniformly distributed) array, estimate the likely position using the values themselves and probe there; repeat on the narrowed side.
Example inputs and outputs
Example 1
$$ \text{Input: } A = [10, 20, 30, 40, 50, 60, 70], \quad \text{target} = 55 $$
$$ \text{Output: } \text{not found } (\text{probes near indices } 4\text{β}5) $$
Example 2
$$ \text{Input: } A = [5, 15, 25, 35, 45, 55, 65], \quad \text{target} = 45 $$
$$ \text{Output: } \text{index} = 4 $$
Example 3
$$ \text{Input: } A = [1, 1000, 1001, 1002], \quad \text{target} = 2 $$
$$ \text{Output: } \text{not found } (\text{bad distribution for interpolation}) $$
How it works
A[high] == A[low]
, stop (or binary-search fallback).pos
to [low, high]
before probing.A
is sorted and values are uniform.Probe formula:
pos β low + (high - low) * (target - A[low]) / (A[high] - A[low])
Let say we have following array and target:
A = [ 10 ][ 20 ][ 30 ][ 40 ][ 50 ][ 60 ][ 70 ]
i = 0 1 2 3 4 5 6
target = 45
Step 1 β initial probe
low=0 (A[0]=10), high=6 (A[6]=70)
pos β 0 + (6-0) * (45-10)/(70-10)
β 6 * 35/60
β 3.5 β probe near 3.5
A = [ 10 ][ 20 ][ 30 ][ 40 ][ 50 ][ 60 ][ 70 ]
i = 0 1 2 3 4 5 6
βL βH
βposβ3.5 β choose βposβ=3 (or βposβ=4)
Probe index 3: A[3]=40 < 45
β set low = 3 + 1 = 4
Step 2 β after moving low
A = [ 10 ][ 20 ][ 30 ][ 40 ][ 50 ][ 60 ][ 70 ]
i = 0 1 2 3 4 5 6
βL βH
At this point, an early-stop check already tells us target (45) < A[low] (50)
β cannot exist in A[4..6]
β not found.
Map a key to an array index with a hash function; look at that bucket to find the key, giving expected $O(1)$ lookups under a good hash and healthy load factor.
Example inputs and outputs
Example 1
$$ \text{Table size: } m = 7, \quad \text{Keys stored: } {10, 24, 31}, \quad \text{Target: } 24 $$
$$ \text{Output: } \text{found (bucket 3)} $$
Example 2
$$ \text{Table size: } m = 7, \quad \text{Keys stored: } {10, 24, 31}, \quad \text{Target: } 18 $$
$$ \text{Output: } \text{not found} $$
How it works
+-----+ hash +-----------------+ search/compare +--------+
| key | --------------> | index in array | ----------------------> | match? |
+-----+ +-----------------+ +--------+
Array (buckets/indexes 0..6):
Idx: 0 1 2 3 4 5 6
+---+-----+-----+-----+-----+-----+-----+
| | | | | | | |
+---+-----+-----+-----+-----+-----+-----+
Example mapping with h(k) = k mod 7
, stored keys {10, 24, 31}
all hash to index 3
.
Strategy A β Separate Chaining (linked list per bucket)
Insertions
10 -> 3
24 -> 3 (collides with 10; append to bucket[3] list)
31 -> 3 (collides again; append to bucket[3] list)
Idx: 0 1 2 3 4 5 6
+---+-----+-----+-----+-----+-----+-----+
| | | | β’ | | | |
+---+-----+-----+-----+-----+-----+-----+
bucket[3] chain: [10] β [24] β [31] β β
Search(24)
1) Compute index = h(24) = 3
2) Inspect bucket 3's chain:
[10] β [24] β [31]
β found here
3) Return FOUND (bucket 3)
Strategy B β Open Addressing (Linear Probing)
Insertions
10 -> 3 place at 3
24 -> 3 (occupied) β probe 4 β place at 4
31 -> 3 (occ) β 4 (occ) β probe 5 β place at 5
Idx: 0 1 2 3 4 5 6
+---+-----+-----+-----+-----+-----+-----+
| | | | 10 | 24 | 31 | |
+---+-----+-----+-----+-----+-----+-----+
Search(24)
1) Compute index = h(24) = 3
2) Probe sequence:
3: 10 β 24 β continue
4: 24 = target β FOUND at index 4
(If not found, continue probing until an empty slot or wrap limit.)
Keep everything in one array; on collision, probe alternative positions in a deterministic sequence until an empty slot or the key is found.
Example inputs and outputs
Example 1
$$ m = 10, \quad \text{Stored keys: } {12, 22, 32}, \quad \text{Target: } 22 $$
$$ \text{Output: } \text{found (index 3)} $$
Example 2
$$ m = 10, \quad \text{Stored keys: } {12, 22, 32}, \quad \text{Target: } 42 $$
$$ \text{Output: } \text{not found} $$
How it works
Hash function:
h(k) = k mod 10
Probe sequence: i, i+1, i+2, ... (wrap around)
Insertions
h(12)=2
β place at index 2h(22)=2
occupied β probe 3 β place at 3h(32)=2
occupied β probe 3 (occupied) β probe 4 β place at 4Resulting table (indexes 0..9):
Index: 0 1 2 3 4 5 6 7 8 9
+---+---+----+----+----+---+---+---+---+---+
Value: | | | 12 | 22 | 32 | | | | | |
+---+---+----+----+----+---+---+---+---+---+
Search(22)
h(22)=2
Path followed:
2 β 3
Search(42)
h(42)=2
Path followed:
2 β 3 β 4 β 5 (β
)
Example inputs and outputs
Example 1
$$ m = 11 (\text{prime}), \quad \text{Stored keys: } {22, 33, 44}, \quad \text{Target: } 33 $$
$$ h(k) = k \bmod m, \quad h(33) = 33 \bmod 11 = 0 $$
$$ \text{Output: found (index 4)} $$
Example 2
$$ m = 11 (\text{prime}), \quad \text{Stored keys: } {22, 33, 44}, \quad \text{Target: } 55 $$
$$ h(55) = 55 \bmod 11 = 0 $$
$$ \text{Output: not found} $$
How it works
Hash function:
h(k) = k mod 11
Probe sequence (relative offsets):
+1Β², +2Β², +3Β², ... mod 11
= +1, +4, +9, +5, +3, +3Β²β¦ (wrapping around table size)
So from h(k)
, we try slots in this order:
h, h+1, h+4, h+9, h+5, h+3, ... (all mod 11)
Insertions
h(22)=0
β place at index 0h(33)=0
occupied β try 0+1Β²=1
β index 1 free β place at 1h(44)=0
occupied β probe 1 (occupied) β probe 0+4=4
β place at 4Resulting table:
Idx: 0 1 2 3 4 5 6 7 8 9 10
+----+----+---+---+----+---+--+--+--+---+
Val: | 22 | 33 | | | 44 | | | | | | |
+----+----+---+---+----+---+--+--+--+--+---+
Search(33)
h(33)=0
β slot 0 = 22 β 330+1Β²=1
β slot 1 = 33 β
FOUNDPath:
0 β 1
Search(55)
h(55)=0
β slot 0 = 22 β 550+1Β²=1
β slot 1 = 33 β 550+2Β²=4
β slot 4 = 44 β 550+3Β²=9
β slot 9 = empty β stop β β NOT FOUNDPath:
0 β 1 β 4 β 9 (β
)
Example inputs and outputs
Hash functions
$$ h_{1}(k) = k \bmod 11, \quad h_{2}(k) = 1 + (k \bmod 10) $$
Probing sequence:
$$ h(k,i) = \big(h_{1}(k) + i \cdot h_{2}(k)\big) \bmod 11 $$
Example 1
$$ m = 11, \quad \text{Stored keys: } {22, 33, 44}, \quad \text{Target: } 33 $$
For $k = 33$:
$$ h_{1}(33) = 33 \bmod 11 = 0, \quad h_{2}(33) = 1 + (33 \bmod 10) = 1 + 3 = 4 $$
So probe sequence is
$$ h(33,0) = 0, h(33,1) = (0 + 1\cdot 4) \bmod 11 = 4, h(33,2) = (0 + 2\cdot 4) \bmod 11 = 8, \dots $$
Since the stored layout places $33$ at index $4$, the search succeeds.
$$ \text{Output: found (index 4)} $$
Example 2
$$ m = 11, \quad \text{Stored keys: } {22, 33, 44}, \quad \text{Target: } 55 $$
For $k = 55$:
$$ h_{1}(55) = 55 \bmod 11 = 0, \quad h_{2}(55) = 1 + (55 \bmod 10) = 1 + 5 = 6 $$
Probing sequence:
$$ 0, (0+6)\bmod 11 = 6, (0+2\cdot 6)\bmod 11 = 1, (0+3\cdot 6)\bmod 11 = 7, \dots $$
No slot matches $55$.
$$ \text{Output: not found} $$
How it works
We use two hash functions:
hβ(k) = k mod m
hβ(k) = 1 + (k mod 10)
Probe sequence:
i, i + hβ, i + 2Β·hβ, i + 3Β·hβ, ... (all mod m)
This ensures fewer clustering issues compared to linear or quadratic probing.
Insertions (m = 11)
Insert 22
hβ(22)=0
β place at index 0Insert 33
hβ(33)=0
(occupied)hβ(33)=1+(33 mod 10)=4
Insert 44
hβ(44)=0
(occupied)hβ(44)=1+(44 mod 10)=5
Table State
Idx: 0 1 2 3 4 5 6 7 8 9 10
+---+---+---+---+---+---+---+---+---+---+
Val: |22 | | | |33 |44 | | | | | |
+---+---+---+---+---+---+---+---+---+---+---+
Search(33)
hβ(33)=0
β slot 0 = 22 β 330+1Β·hβ(33)=0+4=4
β slot 4 = 33 β
FOUNDPath:
0 β 4
Search(55)
hβ(55)=0
, hβ(55)=1+(55 mod 10)=6
Path:
0 β 6 (β
)
Each array cell holds a small container (e.g., a linked list); colliding keys live together in that bucket.
Example inputs and outputs
Setup
$$ m = 5, \quad h(k) = k \bmod 5, \quad \text{buckets hold linked lists} $$
Keys stored:
$$ {12, 22, 7, 3, 14} $$
Bucket contents after hashing:
$$ \begin{aligned} h(12) &= 12 \bmod 5 = 2 \Rightarrow& \text{bucket 2: } [12] [6pt] h(22) &= 22 \bmod 5 = 2 \Rightarrow& \text{bucket 2: } [12, 22] [6pt] h(7) &= 7 \bmod 5 = 2 \Rightarrow& \text{bucket 2: } [12, 22, 7] [6pt] h(3) &= 3 \bmod 5 = 3 \Rightarrow& \text{bucket 3: } [3] [6pt] h(14) &= 14 \bmod 5 = 4 \Rightarrow& \text{bucket 4: } [14] \end{aligned} $$
Example 1
$$ \text{Target: } 22 $$
$$ h(22) = 2 \Rightarrow \text{bucket 2} = [12, 22, 7] $$
Found at position 2 in the list.
$$ \text{Output: found (bucket 2, position 2)} $$
Example 2
$$ \text{Target: } 9 $$
$$ h(9) = 9 \bmod 5 = 4 \Rightarrow \text{bucket 4} = [14] $$
No match.
$$ \text{Output: not found} $$
How it works
h(k) = k mod 5
Buckets store small lists (linked lists or dynamic arrays)
Idx: 0 1 2 3 4
[ ] [ ] [ 12 β 22 β 7 ] [ 3 ] [ 14 ]
Search(22):
- Compute bucket b = h(22) = 2
- Linearly scan bucket 2 β find 22
Search(9):
- b = h(9) = 4
- Bucket 4: [14] β 9 not present β NOT FOUND
Keep two (or more) hash positions per key; insert by βkicking outβ occupants to their alternate home so lookups check only a couple of places.
Example inputs and outputs
Setup Two hash tables $T_{1}$ and $T_{2}$, each of size
$$ m = 5 $$
Two independent hash functions:
$$ h_{1}(k), \quad h_{2}(k) $$
Cuckoo hashing invariant:
Example 1
$$ \text{Target: } 15 $$
Lookup procedure:
Result:
$$ \text{found in } T_{2} \text{ at index } 4 $$
$$ \text{Output: found (Tβ, index 4)} $$
Example 2
If insertion causes repeated displacements and eventually loops:
$$ \text{Cycle detected } \Rightarrow \text{rehash with new } h_{1}, h_{2} $$
$$ \text{Output: rebuild / rehash required} $$
How it works
We keep two hash tables (Tβ, Tβ), each with its own hash function. Every key can live in exactly one of two possible slots:
Hash functions:
$$ h_1(k) = k \bmod 5, \quad h_2(k) = 1 + (k \bmod 4) $$
Every key can live in exactly one of two slots: $T_1[h_1(k)]$ or $T_2[h_2(k)]$. If a slot is occupied, we evict the old occupant and reinsert it at its alternate location.
Start empty:
Tβ: [ ][ ][ ][ ][ ]
Tβ: [ ][ ][ ][ ][ ]
Insert 10 β goes to $T_1[h_1(10)=0]$:
Tβ: [10 ][ ][ ][ ][ ]
Tβ: [ ][ ][ ][ ][ ]
Insert 15
Tβ: [15 ][ ][ ][ ][ ]
Tβ: [ ][ ][ ][10 ][ ]
Insert 20
Tβ: [20 ][ ][ ][ ][ ]
Tβ: [ ][ ][ ][10 ][15 ]
Insert 25
Tβ: [25 ][ ][ ][ ][ ]
Tβ: [ ][20 ][ ][10 ][15 ]
π Search(15)
FOUND in Tβ at index 4
Space-efficient structure for fast membership tests; answers βmaybe presentβ or βdefinitely not presentβ with a tunable false-positive rate and no false negatives (if built correctly, without deletions).
Example inputs and outputs
Setup
$$ m = 16 \text{bits}, \quad k = 3 \text{hash functions } (h_{1}, h_{2}, h_{3}) $$
Inserted set:
$$ {"cat", "dog"} $$
Example 1
$$ \text{Query: contains("cat")} $$
All $h_{i}(\text{"cat"})$ bits are set β actual member.
$$ \text{Output: maybe present (true positive)} $$
Example 2
$$ \text{Query: contains("cow")} $$
One probed bit = 0 β cannot be present.
$$ \text{Output: definitely not present} $$
Example 3
$$ \text{Query: contains("eel")} $$
All $h_{i}(\text{"eel"})$ bits happen to be set, even though "eel" was never inserted.
$$ \text{Output: maybe present (false positive)} $$
How it works
Initial state (all zeros):
Idx: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
A = [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
Insert "cat"
h1(cat) = 3, h2(cat) = 7, h3(cat) = 12
β Set bits at 3, 7, 12
Idx: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
A = [0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0]
^ ^ ^
3 7 12
Insert "dog"
h1(dog) = 1, h2(dog) = 7, h3(dog) = 9
β Set bits at 1, 7, 9 (7 already set)
Idx: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
A = [0 1 0 1 0 0 0 1 0 1 0 0 1 0 0 0]
^ ^ ^
1 7 9
Query "cow"
h1(cow) = 1 β bit[1] = 1
h2(cow) = 3 β bit[3] = 1
h3(cow) = 6 β bit[6] = 0 β
Idx: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
A = [0 1 0 1 0 0 0 1 0 1 0 0 1 0 0 0]
β β β
At least one zero β DEFINITELY NOT PRESENT
Query "eel"
h1(eel) = 7 β bit[7] = 1
h2(eel) = 9 β bit[9] = 1
h3(eel) = 12 β bit[12] = 1
Idx: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
A = [0 1 0 1 0 0 0 1 0 1 0 0 1 0 0 0]
β β β
All ones β MAYBE PRESENT (could be a false positive)
Bloom filter variant that keeps a small counter per bit so you can delete by decrementing; still probabilistic and may have false positives.
Example inputs and outputs
Setup
$$ m = 12 \text{counters (each 2β4 bits)}, \quad k = 3 \text{hash functions} $$
Inserted set:
$$ {\text{"alpha"}, \text{"beta"}} $$
Then delete "alpha"
.
Example 1
$$ \text{Query: contains("alpha")} $$
Counters for "alpha"
decremented; at least one probed counter is now $0$.
$$ \text{Output: definitely not present} $$
Example 2
$$ \text{Query: contains("beta")} $$
All three counters for "beta"
remain $>0$.
$$ \text{Output: maybe present} $$
Example 3
$$ \text{Query: contains("gamma")} $$
At least one probed counter is $0$.
$$ \text{Output: definitely not present} $$
How it works
Each cell is a small counter (e.g. 4-bits, range 0..15). This allows deletions: increment on insert, decrement on delete.
Initial state
Idx: 0 1 2 3 4 5 6 7 8 9 10 11
A = [0 0 0 0 0 0 0 0 0 0 0 0]
Insert "alpha"
Hashes: {2, 5, 9}
β Increment those counters
Idx: 0 1 2 3 4 5 6 7 8 9 10 11
A = [0 0 1 0 0 1 0 0 0 1 0 0]
β β β
2 5 9
Insert "beta"
Hashes: {3, 5, 11}
β Increment those counters
Idx: 0 1 2 3 4 5 6 7 8 9 10 11
A = [0 0 1 1 0 2 0 0 0 1 0 1]
β β β
3 5 11
Lookup "beta"
Hashes: {3, 5, 11}
Counters = {1, 2, 1} β all > 0
β Result: MAYBE PRESENT
Idx: 0 1 2 3 4 5 6 7 8 9 10 11
A = [0 0 1 1 0 2 0 0 0 1 0 1]
β β β
Delete "alpha"
Hashes: {2, 5, 9}
β Decrement those counters
Idx: 0 1 2 3 4 5 6 7 8 9 10 11
A = [0 0 0 1 0 1 0 0 0 0 0 1]
β β β
2 5 9
Lookup "alpha"
Hashes: {2, 5, 9}
Counters = {0, 1, 0}
β At least one zero
β Result: DEFINITELY NOT PRESENT
Idx: 0 1 2 3 4 5 6 7 8 9 10 11
A = [0 0 0 1 0 1 0 0 0 0 0 1]
β β β
Hash-tableβstyle filter that stores short fingerprints in two possible buckets; supports insert, lookup, delete with low false-positive rates and high load factors.
Example inputs and outputs
Setup
$$ b = 8 \text{buckets}, \quad \text{bucket size} = 2, \quad \text{fingerprint size} = 8 \text{bits} $$
Inserted set:
$$ {\text{"cat"}, \text{"dog"}, \text{"eel"}} $$
Each element is stored as a short fingerprint in one of two candidate buckets.
Example 1
$$ \text{Query: contains("cat")} $$
Fingerprint for "cat"
is present in one of its candidate buckets.
$$ \text{Output: maybe present (true positive)} $$
Example 2
$$ \text{Query: contains("fox")} $$
Fingerprint for "fox"
is absent from both candidate buckets.
$$ \text{Output: definitely not present} $$
Example 3 (Deletion)
$$ \text{Operation: remove("dog")} $$
Fingerprint for "dog"
is removed from its bucket.
$$ \text{Result: deletion supported directly by removing the fingerprint} $$
How it works
Each key x
β short fingerprint f = FP(x)
Two candidate buckets:
i1 = H(x) mod b
i2 = i1 XOR H(f) mod b
(f
can be stored in either bucket; moving between buckets preserves the invariant.)Start (empty)
[0]: [ -- , -- ] [1]: [ -- , -- ] [2]: [ -- , -- ] [3]: [ -- , -- ]
[4]: [ -- , -- ] [5]: [ -- , -- ] [6]: [ -- , -- ] [7]: [ -- , -- ]
Insert "cat"
f = 0xA7
i1 = 1
i2 = 1 XOR H(0xA7) = 5
Bucket 1 has free slot β place 0xA7 in [1]
[1]: [ A7 , -- ]
Insert "dog"
f = 0x3C
i1 = 5
i2 = 5 XOR H(0x3C) = 2
Bucket 5 has free slot β place 0x3C in [5]
[1]: [ A7 , -- ] [5]: [ 3C , -- ]
Insert "eel"
f = 0xD2
i1 = 1
i2 = 1 XOR H(0xD2) = 4
Bucket 1 has one free slot β place 0xD2 in [1]
[1]: [ A7 , D2 ] [5]: [ 3C , -- ]
Lookup "cat"
f = 0xA7
Buckets: i1 = 1, i2 = 5
Check: bucket[1] has A7 β found
Result: MAYBE PRESENT
Lookup "fox"
f = 0x9B
i1 = 0
i2 = 0 XOR H(0x9B) = 7
Check buckets 0 and 7 β fingerprint not found
Result: DEFINITELY NOT PRESENT
Slide the pattern one position at a time over the text; at each shift compare characters left-to-right until a mismatch or a full match.
Example inputs and outputs
Example 1
$$ \text{Text: } "abracadabra", \quad \text{Pattern: } "abra" $$
$$ \text{Output: matches at indices } 0 \text{and} 7 $$
Example 2
$$ \text{Text: } "aaaaa", \quad \text{Pattern: } "aaa" $$
$$ \text{Output: matches at indices } 0, 1, 2 $$
How it works
Text (length 11):
Text: a b r a c a d a b r a
Idx: 0 1 2 3 4 5 6 7 8 9 10
Pattern (length 4):
Pattern: a b r a
Shift 0
Text: a b r a
Pattern: a b r a
β All match β REPORT at index 0
Shift 1
Text: b r a c
Pattern: a b r a
β Mismatch at first char β advance
Shift 2
Text: r a c a
Pattern: a b r a
β Mismatch β advance
Shift 3
Text: a c a d
Pattern: a b r a
β Mismatch β advance
Shift 4
Text: c a d a
Pattern: a b r a
β Mismatch β advance
Shift 5
Text: a d a b
Pattern: a b r a
β Mismatch β advance
Shift 6
Text: d a b r
Pattern: a b r a
β Mismatch β advance
Shift 7
Text: a b r a
Pattern: a b r a
β All match β REPORT at index 7
Precompute a table (LPS / prefix-function) for the pattern so that on a mismatch you βjumpβ the pattern to the longest proper prefix that is also a suffix, avoiding rechecks.
Example inputs and outputs
Example 1
$$ \text{Text: } "ababcabcabababd", \quad \text{Pattern: } "ababd" $$
$$ \text{Output: match at index } 10 $$
Example 2
$$ \text{Text: } "aaaaab", \quad \text{Pattern: } "aaab" $$
$$ \text{Output: match at index } 2 $$
How it works
We want to find the pattern "ababd"
in the text "ababcabca babab d"
.
1) Precompute LPS (Longest Proper Prefix that is also a Suffix)
Pattern:
a b a b d
0 1 2 3 4 β index
LPS array:
0 0 1 2 0
Meaning:
"abab"
), LPS=2 means if mismatch occurs, restart comparison from "ab"
inside the pattern.2) Scan Text with Two Pointers
i
= text indexj
= pattern indexText:
a b a b c a b c a b a b a b d
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Pattern: a b a b d
Step A: Initial matches
i=0..3: "abab" matched β j=4 points to 'd'
Step B: Mismatch at i=4
text[i=4] = 'c'
pattern[j=4] = 'd' β mismatch
Instead of restarting, use LPS:
j = LPS[j-1] = LPS[3] = 2
So pattern jumps back to "ab"
(no wasted text comparisons).
i stays at 4.
Step C: Continue scanning
The algorithm keeps moving forward, reusing LPS whenever mismatches occur.
Step D: Full match found
At i=14
, j advances to 5 (pattern length).
β FULL MATCH found!
Start index = i - m + 1 = 14 - 5 + 1 = 10
β
Pattern "ababd"
occurs in the text starting at index 10.
Compare the pattern right-to-left; on a mismatch, skip ahead using bad-character and good-suffix rules so many text characters are never touched.
Example inputs and outputs
Example 1
$$ \text{Text: } "HERE IS A SIMPLE EXAMPLE", \quad \text{Pattern: } "EXAMPLE" $$
$$ \text{Output: match at index } 17 $$
Example 2
$$ \text{Text: } "NEEDLE IN A HAYSTACK", \quad \text{Pattern: } "STACK" $$
$$ \text{Output: match at index } 15 $$
How it works
Text (with spaces shown as _
):
0 10 20 30
H E R E _ I S _ A _ S I M P L E _ E X A M P L E
Pattern: "EXAMPLE"
(length = 7)
Step 1: Align pattern at text[10..16] = "SIMPLE"
Text: ... S I M P L E ...
Pattern: E X A M P L E
β (start comparing right β left)
Compare right-to-left:
E=E, L=L, P=P, M=M, A=A,
X vs I β mismatch
I
(from text).I
? β β no.I
.Step 2: Shift until pattern under "EXAMPLE"
Text: ... E X A M P L E
Pattern: E X A M P L E
Compare right-to-left:
E=E, L=L, P=P, M=M, A=A, X=X, E=E
β Full match found at index 17.
Compare rolling hashes of the current text window and the pattern; only if hashes match do a direct character check (to rule out collisions).
Example inputs and outputs
Example 1
$$ \text{Text: } "ABCDABCABCD", \quad \text{Pattern: } "ABC" $$
$$ \text{Output: matches at indices } 0, 4, 7 $$
Example 2
$$ \text{Text: } "ABCDE", \quad \text{Pattern: } "FG" $$
$$ \text{Output: no match} $$
How it works
Weβll use the classic choices:
val(c) = ASCII(c)
(e.g., A=65, B=66, C=67, D=68
)
Text: A B C D A B C A B C D
Index: 0 1 2 3 4 5 6 7 8 9 10
Pattern: A B C (m = 3)
Precompute
pow = B^(m-1) mod M = 256^2 mod 101 = 88
HP = H(P) = H("ABC")
Start h=0
, then for each char h = (B*h + val) mod M
:
(256*0 + 65) % 101 = 65
(256*65 + 66) % 101 = 41
(256*41 + 67) % 101 = 59
So HP = 59.
Rolling all windows
Initial window T[0..2]="ABC"
:
h0 = 59
(matches HP β verify chars β β
match at 0)
For rolling:
h_next = ( B * (h_curr β val(left) * pow) + val(new) ) mod M
(If the inner term is negative, add M
before multiplying.)
First two rolls
From $[0..2]$ "ABC" $(h0=59)$ β $[1..3]$ "BCD":
left='A'(65), new='D'(68)
inner = (59 β 65*88) mod 101 = (59 β 5720) mod 101 = 96
h1 = (256*96 + 68) mod 101 = 24644 mod 101 = 0
From $[3..5]$ "DAB" $(h3=66)$ β $[4..6]$ "ABC":
left='D'(68), new='C'(67)
inner = (66 β 68*88) mod 101 = (66 β 5984) mod 101 = 41
h4 = (256*41 + 67) mod 101 = 10563 mod 101 = 59 (= HP)
All windows (start index s)
s window hash =HP?
0 ABC 59 β β verify β β
MATCH at 0
1 BCD 0
2 CDA 38
3 DAB 66
4 ABC 59 β β verify β β
MATCH at 4
5 BCA 98
6 CAB 79
7 ABC 59 β β verify β β
MATCH at 7
8 BCD 0
Matches at indices: 0, 4, 7.