Last modified: September 01, 2025

This article is written in: πŸ‡ΊπŸ‡Έ

Searching

Searching refers to the process of finding the location of a specific element within a collection of data, such as an array, list, tree, or graph. It underpins many applications, from databases and information retrieval to routing and artificial intelligence. Depending on the organization of the data, different search techniques are usedβ€”such as linear search for unsorted data, binary search for sorted data, and more advanced approaches like hash-based lookup or tree traversals for hierarchical structures. Efficient searching is important because it directly impacts the performance and scalability of software systems.

Scan the list from left to right, comparing the target with each element until you either find a match (return its index) or finish the list (report β€œnot found”).

Example inputs and outputs

Example 1

$$ \text{Input: } [7, 3, 5, 2, 9], \quad \text{target} = 5 $$

$$ \text{Output: } \text{index} = 2 $$

Example 2

$$ \text{Input: } [4, 4, 4], \quad \text{target} = 4 $$

$$ \text{Output: } \text{index} = 0 (\text{first match}) $$

Example 3

$$ \text{Input: } [10, 20, 30], \quad \text{target} = 25 $$

$$ \text{Output: } \text{not found} $$

How Linear Search Works

We start at index 0, compare the value with the target, and keep moving right until we either find it or reach the end.

Target 5 in [7, 3, 5, 2, 9]

Indexes:   0    1    2    3    4
List:     [7]  [3]  [5]  [2]  [9]
Target: 5

Step 1: pointer at index 0

|
v
7   3   5   2   9

β†’ compare 7 vs 5  β†’ no

Step 2: pointer moves to index 1

|
    v
7   3   5   2   9

β†’ compare 3 vs 5  β†’ no

Step 3: pointer moves to index 2

|
        v
7   3   5   2   9

β†’ compare 5 vs 5  β†’ YES βœ… β†’ return index 2

Worst Case (Not Found)

Target 9 in [1, 2, 3]

Indexes:   0    1    2
List:     [1]  [2]  [3]
Target: 9

Checks:

β†’  1 β‰  9
β†’  2 β‰  9
β†’  3 β‰  9
β†’  end
β†’ not found ❌

Place one copy of the target at the very end as a β€œsentinel” so the scan can run without checking bounds each step; afterward, decide whether the match was inside the original list or only at the sentinel position.

Example inputs and outputs

Example 1

$$ \text{Input: } [12, 8, 6, 15], \quad \text{target} = 6 $$

$$ \text{Output: } \text{index} = 2 $$

Example 2

$$ \text{Input: } [2, 4, 6, 8], \quad \text{target} = 5 $$

$$ \text{Output: } \text{not found } (\text{only the sentinel matched}) $$

How it works

Put the target at one extra slot at the end so the loop is guaranteed to stop on a match; afterward, check whether the match was inside the original range.

Target 11 not in the list

Original list (n=5):
[ 4 ][ 9 ][ 1 ][ 7 ][ 6 ]
Target: 11

Add sentinel (extra slot):

[ 4 ][ 9 ][ 1 ][ 7 ][ 6 ][ 11 ]
  0    1    2    3    4     5  ← sentinel

Scan step by step:

4 β‰  11   β†’ pointer at 0
9 β‰  11   β†’ pointer at 1
1 β‰  11   β†’ pointer at 2
7 β‰  11   β†’ pointer at 3
6 β‰  11   β†’ pointer at 4
11 = 11  β†’ pointer at 5 (sentinel)

Therefore, not found in original list.

Target 6 inside the list

Original list (n=4):
[ 12 ][ 8 ][ 6 ][ 15 ]
Target: 6

Add sentinel:

[ 12 ][ 8 ][ 6 ][ 15 ][ 6 ]
  0     1    2    3     4

Scan:

12 β‰  6   β†’ index 0
 8 β‰  6   β†’ index 1
 6 = 6   β†’ index 2 βœ…

On a sorted array, repeatedly halve the search interval by comparing the target to the middle element until found or the interval is empty.

Example inputs and outputs

Example 1

$$ \text{Input: } A = [2, 5, 8, 12, 16, 23, 38], \quad \text{target} = 16 $$

$$ \text{Output: } \text{index} = 4 $$

Example 2

$$ \text{Input: } A = [1, 3, 3, 3, 9], \quad \text{target} = 3 $$

$$ \text{Output: } \text{index} = 2 \quad (\text{any valid match; first/last needs a variant}) $$

Example 3

$$ \text{Input: } A = [10, 20, 30, 40], \quad \text{target} = 35 $$

$$ \text{Output: } \text{not found} $$

How it works

We repeatedly check the middle element, and then discard half the list based on comparison.

Find 16 in:

A = [  2 ][  5 ][  8 ][ 12 ][ 16 ][ 23 ][ 38 ]
i =    0     1     2     3     4     5     6

Step 1

low = 0, high = 6
mid = (0+6)//2 = 3
A[3] = 12 < 16  β†’  target is to the RIGHT  β†’  new low = mid + 1 = 4

A = [  2 ][  5 ][  8 ][ 12 ][ 16 ][ 23 ][ 38 ]
i =    0     1     2     3     4     5     6
        ↑L                ↑M                ↑H
        0                 3                 6
Active range: indices 0..6

Step 2

low = 4, high = 6
mid = (4+6)//2 = 5
A[5] = 23 > 16  β†’  target is to the LEFT   β†’  new high = mid - 1 = 4

A = [  2 ][  5 ][  8 ][ 12 ][ 16 ][ 23 ][ 38 ]
i =    0     1     2     3     4     5     6
                              ↑L      ↑M      ↑H
                              4       5       6
Active range: indices 4..6

Step 3

low = 4, high = 4
mid = 4
A[4] = 16 == target βœ…

A = [  2 ][  5 ][  8 ][ 12 ][ 16 ][ 23 ][ 38 ]
i =    0     1     2     3     4     5     6
                              ↑LMH
                              4
Active range: indices 4..4

FOUND at index 4

Like binary, but splits the current interval into three parts using two midpoints; used mainly for unimodal functions or very specific array cases.

Example inputs and outputs

Example 1

$$ \text{Input: } A = [1, 4, 7, 9, 12, 15], \quad \text{target} = 9 $$

$$ \text{Output: } \text{index} = 3 $$

Example 2

$$ \text{Input: } A = [2, 6, 10, 14], \quad \text{target} = 5 $$

$$ \text{Output: } \text{not found} $$

How it works

We divide the array into three parts using two midpoints m1 and m2.

A = [  1 ][  4 ][  7 ][  9 ][ 12 ][ 15 ]
i =    0     1     2     3     4     5
Target: 9

Step 1

low = 0, high = 5

m1 = low + (high - low)//3  = 0 + (5)//3 = 1
m2 = high - (high - low)//3 = 5 - (5)//3 = 3

A[m1] = 4
A[m2] = 9

A = [  1 ][  4 ][  7 ][  9 ][ 12 ][ 15 ]
i =    0     1     2     3     4     5
        ↑L    ↑m1        ↑m2           ↑H
        0     1          3             5

FOUND at index 3

On a sorted array, jump ahead in fixed block sizes to find the block that may contain the target, then do a linear scan inside that block.

Example inputs and outputs

Example 1

$$ \text{Input: } A = [1, 4, 9, 16, 25, 36, 49], \quad \text{target} = 25, \quad \text{jump} = \lfloor \sqrt{7} \rfloor = 2 $$

$$ \text{Output: } \text{index} = 4 $$

Example 2

$$ \text{Input: } A = [3, 8, 15, 20, 22, 27], \quad \text{target} = 21, \quad \text{jump} = 2 $$

$$ \text{Output: } \text{not found} $$

How it works

Perfect β€” that’s a jump search trace. Let me reformat and polish it so the steps are crystal clear and the β€œjump + linear scan” pattern pops visually:

We’re applying jump search to find $25$ in

$$ A = [1, 4, 9, 16, 25, 36, 49] $$

with $n=7$, block size $\approx \sqrt{7} \approx 2$, so jump=2.

We probe every 2nd index:

So target is in block $(2..4]$

[ 1 ][ 4 ] | [ 9 ][16 ] | [25 ][36 ] | [49 ]
    ^            ^            ^            ^
   probe=0      probe=2      probe=4      probe=6

Linear Scan in block (indexes 3..4)

Block [16 ][25 ]
       ^    ^
      i=3  i=4 (found!)

The element $25$ is found at index 4.

On a sorted array, grow the right boundary exponentially (1, 2, 4, 8, …) to find a containing range, then finish with binary search in that range.

Example inputs and outputs

Example 1

$$ \text{Input: } A = [2, 3, 5, 7, 11, 13, 17, 19, 23], \quad \text{target} = 19 $$

$$ \text{Output: } \text{index} = 7 $$

Example 2

$$ \text{Input: } A = [10, 20, 30, 40, 50], \quad \text{target} = 12 $$

$$ \text{Output: } \text{not found} $$

How it works

A = [  2 ][  3 ][  5 ][  7 ][ 11 ][ 13 ][ 17 ][ 19 ][ 23 ]
i =    0     1     2     3    4     5     6     7     8
Target = 19

Find range by exponential jumps

Start at i=1, double each step until A[i] β‰₯ target (or end).

Jump 1: i=1

A[i]=3 ≀ 19  β†’ continue
A = [  2 ][  3 ][  5 ][  7 ][ 11 ][ 13 ][ 17 ][ 19 ][ 23 ]
i =    0     1     2     3    4     5     6     7     8
             ↑

Jump 2: i=2

A[i]=5 ≀ 19  β†’ continue
A = [  2 ][  3 ][  5 ][  7 ][ 11 ][ 13 ][ 17 ][ 19 ][ 23 ]
i =    0     1     2     3    4     5     6     7     8
                   ↑

Jump 3: i=4

A[i]=11 ≀ 19 β†’ continue
A = [  2 ][  3 ][  5 ][  7 ][ 11 ][ 13 ][ 17 ][ 19 ][ 23 ]
i =    0     1     2     3    4     5     6     7     8
                               ↑

Jump 4: i=8

A[i]=23 > 19  β†’ stop
Range is (previous power of two .. i] = (4 .. 8] β†’ search indices 5..8
A = [  2 ][  3 ][  5 ][  7 ][ 11 ][ 13 ][ 17 ][ 19 ][ 23 ]
i =    0     1     2     3    4     5     6     7     8
                                                         ↑

Range for binary search: low=5, high=8.

Binary search on $A[5..8]$

Subarray: [ 13 ][ 17 ][ 19 ][ 23 ]
Indices :    5     6     7     8

Step 1

low=5, high=8 β†’ mid=(5+8)//2=6
A[6]=17 < 19 β†’ move right β†’ low=7
A = [  2 ][  3 ][  5 ][  7 ][ 11 ][ 13 ][ 17 ][ 19 ][ 23 ]
i =    0     1     2     3    4     5     6     7     8
                                    ↑L      ↑M             ↑H
                                    5       6              8

Step 2

low=7, high=8 β†’ mid=(7+8)//2=7
A[7]=19 == target βœ… β†’ FOUND
A = [  2 ][  3 ][  5 ][  7 ][ 11 ][ 13 ][ 17 ][ 19 ][ 23 ]
i =    0     1     2     3    4     5     6     7     8
                                           ↑L M H
                                              7

Found at index 7.

On a sorted (roughly uniformly distributed) array, estimate the likely position using the values themselves and probe there; repeat on the narrowed side.

Example inputs and outputs

Example 1

$$ \text{Input: } A = [10, 20, 30, 40, 50, 60, 70], \quad \text{target} = 55 $$

$$ \text{Output: } \text{not found } (\text{probes near indices } 4\text{–}5) $$

Example 2

$$ \text{Input: } A = [5, 15, 25, 35, 45, 55, 65], \quad \text{target} = 45 $$

$$ \text{Output: } \text{index} = 4 $$

Example 3

$$ \text{Input: } A = [1, 1000, 1001, 1002], \quad \text{target} = 2 $$

$$ \text{Output: } \text{not found } (\text{bad distribution for interpolation}) $$

How it works

Probe formula:

pos β‰ˆ low + (high - low) * (target - A[low]) / (A[high] - A[low])

Let say we have following array and target:

A = [ 10 ][ 20 ][ 30 ][ 40 ][ 50 ][ 60 ][ 70 ]
i =    0     1     2     3     4     5     6
target = 45

Step 1 β€” initial probe

low=0 (A[0]=10), high=6 (A[6]=70)

pos β‰ˆ 0 + (6-0) * (45-10)/(70-10)
    β‰ˆ 6 * 35/60
    β‰ˆ 3.5  β†’ probe near 3.5

A = [ 10 ][ 20 ][ 30 ][ 40 ][ 50 ][ 60 ][ 70 ]
i =    0     1     2     3     4     5     6
      ↑L                                   ↑H
            ↑posβ‰ˆ3.5 β†’ choose ⌊posβŒ‹=3 (or ⌈posβŒ‰=4)

Probe index 3: A[3]=40 < 45 β†’ set low = 3 + 1 = 4

Step 2 β€” after moving low

A = [ 10 ][ 20 ][ 30 ][ 40 ][ 50 ][ 60 ][ 70 ]
i =    0     1     2     3     4     5     6
                                ↑L         ↑H

At this point, an early-stop check already tells us target (45) < A[low] (50) β†’ cannot exist in A[4..6] β†’ not found.

Map a key to an array index with a hash function; look at that bucket to find the key, giving expected $O(1)$ lookups under a good hash and healthy load factor.

Example inputs and outputs

Example 1

$$ \text{Table size: } m = 7, \quad \text{Keys stored: } {10, 24, 31}, \quad \text{Target: } 24 $$

$$ \text{Output: } \text{found (bucket 3)} $$

Example 2

$$ \text{Table size: } m = 7, \quad \text{Keys stored: } {10, 24, 31}, \quad \text{Target: } 18 $$

$$ \text{Output: } \text{not found} $$

How it works

+-----+      hash       +-----------------+      search/compare      +--------+
| key | --------------> | index in array  | ----------------------> | match? |
+-----+                  +-----------------+                         +--------+

Array (buckets/indexes 0..6):

Idx:   0     1     2     3     4     5     6
      +---+-----+-----+-----+-----+-----+-----+
      |   |     |     |     |     |     |     |
      +---+-----+-----+-----+-----+-----+-----+

Example mapping with h(k) = k mod 7, stored keys {10, 24, 31} all hash to index 3.

Strategy A β€” Separate Chaining (linked list per bucket)

Insertions

10 -> 3
24 -> 3   (collides with 10; append to bucket[3] list)
31 -> 3   (collides again; append to bucket[3] list)

Idx:   0     1     2     3     4     5     6
      +---+-----+-----+-----+-----+-----+-----+
      |   |     |     | β€’   |     |     |     |
      +---+-----+-----+-----+-----+-----+-----+

bucket[3] chain:  [10] β†’ [24] β†’ [31] β†’ βˆ…

Search(24)

1) Compute index = h(24) = 3
2) Inspect bucket 3's chain:
      [10]  β†’  [24]  β†’  [31]
               ↑ found here
3) Return FOUND (bucket 3)

Strategy B β€” Open Addressing (Linear Probing)

Insertions

10 -> 3                    place at 3
24 -> 3  (occupied)  β†’  probe 4  β†’ place at 4
31 -> 3  (occ) β†’ 4 (occ) β†’ probe 5 β†’ place at 5

Idx:   0     1     2     3     4     5     6
      +---+-----+-----+-----+-----+-----+-----+
      |   |     |     | 10  | 24  | 31  |     |
      +---+-----+-----+-----+-----+-----+-----+

Search(24)

1) Compute index = h(24) = 3
2) Probe sequence:
      3: 10 β‰  24  β†’ continue
      4: 24 = target  β†’ FOUND at index 4
   (If not found, continue probing until an empty slot or wrap limit.)

Open Addressing β€” Linear Probing

Keep everything in one array; on collision, probe alternative positions in a deterministic sequence until an empty slot or the key is found.

Example inputs and outputs

Example 1

$$ m = 10, \quad \text{Stored keys: } {12, 22, 32}, \quad \text{Target: } 22 $$

$$ \text{Output: } \text{found (index 3)} $$

Example 2

$$ m = 10, \quad \text{Stored keys: } {12, 22, 32}, \quad \text{Target: } 42 $$

$$ \text{Output: } \text{not found} $$

How it works

Hash function:

h(k) = k mod 10
Probe sequence: i, i+1, i+2, ... (wrap around)

Insertions

Resulting table (indexes 0..9):

Index:   0   1   2    3    4   5   6   7   8   9
        +---+---+----+----+----+---+---+---+---+---+
Value:  |   |   | 12 | 22 | 32 |   |   |   |   |   |
        +---+---+----+----+----+---+---+---+---+---+

Search(22)

Path followed:

2 β†’ 3

Search(42)

Path followed:

2 β†’ 3 β†’ 4 β†’ 5 (βˆ…)

Open Addressing β€” Quadratic Probing

Example inputs and outputs

Example 1

$$ m = 11 (\text{prime}), \quad \text{Stored keys: } {22, 33, 44}, \quad \text{Target: } 33 $$

$$ h(k) = k \bmod m, \quad h(33) = 33 \bmod 11 = 0 $$

$$ \text{Output: found (index 4)} $$

Example 2

$$ m = 11 (\text{prime}), \quad \text{Stored keys: } {22, 33, 44}, \quad \text{Target: } 55 $$

$$ h(55) = 55 \bmod 11 = 0 $$

$$ \text{Output: not found} $$

How it works

Hash function:

h(k) = k mod 11

Probe sequence (relative offsets):

+1Β², +2Β², +3Β², ... mod 11
= +1, +4, +9, +5, +3, +3²… (wrapping around table size)

So from h(k), we try slots in this order:

h, h+1, h+4, h+9, h+5, h+3, ...   (all mod 11)

Insertions

Resulting table:

Idx:   0    1    2   3   4   5  6  7  8  9  10
      +----+----+---+---+----+---+--+--+--+---+
Val:  | 22 | 33 |   |   | 44 |   |  |  |  |  |   |
      +----+----+---+---+----+---+--+--+--+--+---+

Search(33)

Path:

0 β†’ 1

Search(55)

Path:

0 β†’ 1 β†’ 4 β†’ 9 (βˆ…)

Open Addressing β€” Double Hashing

Example inputs and outputs

Hash functions

$$ h_{1}(k) = k \bmod 11, \quad h_{2}(k) = 1 + (k \bmod 10) $$

Probing sequence:

$$ h(k,i) = \big(h_{1}(k) + i \cdot h_{2}(k)\big) \bmod 11 $$

Example 1

$$ m = 11, \quad \text{Stored keys: } {22, 33, 44}, \quad \text{Target: } 33 $$

For $k = 33$:

$$ h_{1}(33) = 33 \bmod 11 = 0, \quad h_{2}(33) = 1 + (33 \bmod 10) = 1 + 3 = 4 $$

So probe sequence is

$$ h(33,0) = 0, h(33,1) = (0 + 1\cdot 4) \bmod 11 = 4, h(33,2) = (0 + 2\cdot 4) \bmod 11 = 8, \dots $$

Since the stored layout places $33$ at index $4$, the search succeeds.

$$ \text{Output: found (index 4)} $$

Example 2

$$ m = 11, \quad \text{Stored keys: } {22, 33, 44}, \quad \text{Target: } 55 $$

For $k = 55$:

$$ h_{1}(55) = 55 \bmod 11 = 0, \quad h_{2}(55) = 1 + (55 \bmod 10) = 1 + 5 = 6 $$

Probing sequence:

$$ 0, (0+6)\bmod 11 = 6, (0+2\cdot 6)\bmod 11 = 1, (0+3\cdot 6)\bmod 11 = 7, \dots $$

No slot matches $55$.

$$ \text{Output: not found} $$

How it works

We use two hash functions:

h₁(k) = k mod m
hβ‚‚(k) = 1 + (k mod 10)

Probe sequence:

i, i + hβ‚‚, i + 2Β·hβ‚‚, i + 3Β·hβ‚‚, ...  (all mod m)

This ensures fewer clustering issues compared to linear or quadratic probing.

Insertions (m = 11)

Insert 22

Insert 33

Insert 44

Table State

Idx:   0   1  2  3   4   5  6  7  8  9  10
      +---+---+---+---+---+---+---+---+---+---+
Val:  |22 |   |   |   |33 |44 |   |   |   |   |   |
      +---+---+---+---+---+---+---+---+---+---+---+

Search(33)

Path:

0 β†’ 4

Search(55)

Path:

0 β†’ 6 (βˆ…)

Separate Chaining

Each array cell holds a small container (e.g., a linked list); colliding keys live together in that bucket.

Example inputs and outputs

Setup

$$ m = 5, \quad h(k) = k \bmod 5, \quad \text{buckets hold linked lists} $$

Keys stored:

$$ {12, 22, 7, 3, 14} $$

Bucket contents after hashing:

$$ \begin{aligned} h(12) &= 12 \bmod 5 = 2 \Rightarrow& \text{bucket 2: } [12] [6pt] h(22) &= 22 \bmod 5 = 2 \Rightarrow& \text{bucket 2: } [12, 22] [6pt] h(7) &= 7 \bmod 5 = 2 \Rightarrow& \text{bucket 2: } [12, 22, 7] [6pt] h(3) &= 3 \bmod 5 = 3 \Rightarrow& \text{bucket 3: } [3] [6pt] h(14) &= 14 \bmod 5 = 4 \Rightarrow& \text{bucket 4: } [14] \end{aligned} $$

Example 1

$$ \text{Target: } 22 $$

$$ h(22) = 2 \Rightarrow \text{bucket 2} = [12, 22, 7] $$

Found at position 2 in the list.

$$ \text{Output: found (bucket 2, position 2)} $$

Example 2

$$ \text{Target: } 9 $$

$$ h(9) = 9 \bmod 5 = 4 \Rightarrow \text{bucket 4} = [14] $$

No match.

$$ \text{Output: not found} $$

How it works

h(k) = k mod 5
Buckets store small lists (linked lists or dynamic arrays)

Idx:   0             1           2                    3            4
     [  ]         [  ]       [ 12 β†’ 22 β†’ 7 ]       [ 3 ]       [ 14 ]

Search(22):
- Compute bucket b = h(22) = 2
- Linearly scan bucket 2 β†’ find 22

Search(9):
- b = h(9) = 4
- Bucket 4: [14] β†’ 9 not present β†’ NOT FOUND

Cuckoo Hashing

Keep two (or more) hash positions per key; insert by β€œkicking out” occupants to their alternate home so lookups check only a couple of places.

Example inputs and outputs

Setup Two hash tables $T_{1}$ and $T_{2}$, each of size

$$ m = 5 $$

Two independent hash functions:

$$ h_{1}(k), \quad h_{2}(k) $$

Cuckoo hashing invariant:

Example 1

$$ \text{Target: } 15 $$

Lookup procedure:

  1. Check $T_{1}[h_{1}(15)]$.
  2. If not found, check $T_{2}[h_{2}(15)]$.

Result:

$$ \text{found in } T_{2} \text{ at index } 4 $$

$$ \text{Output: found (Tβ‚‚, index 4)} $$

Example 2

If insertion causes repeated displacements and eventually loops:

$$ \text{Cycle detected } \Rightarrow \text{rehash with new } h_{1}, h_{2} $$

$$ \text{Output: rebuild / rehash required} $$

How it works

We keep two hash tables (T₁, Tβ‚‚), each with its own hash function. Every key can live in exactly one of two possible slots:

Hash functions:

$$ h_1(k) = k \bmod 5, \quad h_2(k) = 1 + (k \bmod 4) $$

Every key can live in exactly one of two slots: $T_1[h_1(k)]$ or $T_2[h_2(k)]$. If a slot is occupied, we evict the old occupant and reinsert it at its alternate location.

Start empty:

T₁: [   ][   ][   ][   ][   ]
Tβ‚‚: [   ][   ][   ][   ][   ]

Insert 10 β†’ goes to $T_1[h_1(10)=0]$:

T₁: [10 ][   ][   ][   ][   ]
Tβ‚‚: [   ][   ][   ][   ][   ]

Insert 15

T₁: [15 ][   ][   ][   ][   ]
Tβ‚‚: [   ][   ][   ][10 ][   ]

Insert 20

T₁: [20 ][   ][   ][   ][   ]
Tβ‚‚: [   ][   ][   ][10 ][15 ]

Insert 25

T₁: [25 ][   ][   ][   ][   ]
Tβ‚‚: [   ][20 ][   ][10 ][15 ]

πŸ”Ž Search(15)

FOUND in Tβ‚‚ at index 4

Bloom Filter

Space-efficient structure for fast membership tests; answers β€œmaybe present” or β€œdefinitely not present” with a tunable false-positive rate and no false negatives (if built correctly, without deletions).

Example inputs and outputs

Setup

$$ m = 16 \text{bits}, \quad k = 3 \text{hash functions } (h_{1}, h_{2}, h_{3}) $$

Inserted set:

$$ {"cat", "dog"} $$

Example 1

$$ \text{Query: contains("cat")} $$

All $h_{i}(\text{"cat"})$ bits are set β†’ actual member.

$$ \text{Output: maybe present (true positive)} $$

Example 2

$$ \text{Query: contains("cow")} $$

One probed bit = 0 β†’ cannot be present.

$$ \text{Output: definitely not present} $$

Example 3

$$ \text{Query: contains("eel")} $$

All $h_{i}(\text{"eel"})$ bits happen to be set, even though "eel" was never inserted.

$$ \text{Output: maybe present (false positive)} $$

How it works

Initial state (all zeros):

Idx:   0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
A  =  [0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]

Insert "cat"

h1(cat) = 3,  h2(cat) = 7,  h3(cat) = 12
β†’ Set bits at 3, 7, 12

Idx:   0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
A  =  [0  0  0  1  0  0  0  1  0  0  0  0  1  0  0  0]
                 ^              ^              ^
                 3              7              12

Insert "dog"

h1(dog) = 1,  h2(dog) = 7,  h3(dog) = 9
β†’ Set bits at 1, 7, 9  (7 already set)

Idx:   0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
A  =  [0  1  0  1  0  0  0  1  0  1  0  0  1  0  0  0]
           ^              ^       ^
           1              7       9

Query "cow"

h1(cow) = 1 β†’ bit[1] = 1
h2(cow) = 3 β†’ bit[3] = 1
h3(cow) = 6 β†’ bit[6] = 0  ❌

Idx:   0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
A  =  [0  1  0  1  0  0  0  1  0  1  0  0  1  0  0  0]
           βœ“              βœ“     βœ—

At least one zero β†’ DEFINITELY NOT PRESENT

Query "eel"

h1(eel) = 7 β†’ bit[7] = 1
h2(eel) = 9 β†’ bit[9] = 1
h3(eel) = 12 β†’ bit[12] = 1

Idx:   0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
A  =  [0  1  0  1  0  0  0  1  0  1  0  0  1  0  0  0]
                             βœ“     βœ“        βœ“

All ones β†’ MAYBE PRESENT (could be a false positive)

Counting Bloom Filter

Bloom filter variant that keeps a small counter per bit so you can delete by decrementing; still probabilistic and may have false positives.

Example inputs and outputs

Setup

$$ m = 12 \text{counters (each 2–4 bits)}, \quad k = 3 \text{hash functions} $$

Inserted set:

$$ {\text{"alpha"}, \text{"beta"}} $$

Then delete "alpha".

Example 1

$$ \text{Query: contains("alpha")} $$

Counters for "alpha" decremented; at least one probed counter is now $0$.

$$ \text{Output: definitely not present} $$

Example 2

$$ \text{Query: contains("beta")} $$

All three counters for "beta" remain $>0$.

$$ \text{Output: maybe present} $$

Example 3

$$ \text{Query: contains("gamma")} $$

At least one probed counter is $0$.

$$ \text{Output: definitely not present} $$

How it works

Each cell is a small counter (e.g. 4-bits, range 0..15). This allows deletions: increment on insert, decrement on delete.

Initial state

Idx:   0  1  2  3  4  5  6  7  8  9 10 11
A  =  [0  0  0  0  0  0  0  0  0  0  0  0]

Insert "alpha"

Hashes: {2, 5, 9}
β†’ Increment those counters

Idx:   0  1  2  3  4  5  6  7  8  9 10 11
A  =  [0  0  1  0  0  1  0  0  0  1  0  0]
             ↑        ↑              ↑
             2        5              9

Insert "beta"

Hashes: {3, 5, 11}
β†’ Increment those counters

Idx:   0  1  2  3  4  5  6  7  8  9 10 11
A  =  [0  0  1  1  0  2  0  0  0  1  0  1]
                ↑        ↑              ↑
                3        5              11

Lookup "beta"

Hashes: {3, 5, 11}
Counters = {1, 2, 1} β†’ all > 0
β†’ Result: MAYBE PRESENT

Idx:   0  1  2  3  4  5  6  7  8  9 10 11
A  =  [0  0  1  1  0  2  0  0  0  1  0  1]
                βœ“        βœ“              βœ“

Delete "alpha"

Hashes: {2, 5, 9}
β†’ Decrement those counters

Idx:   0  1  2  3  4  5  6  7  8  9 10 11
A  =  [0  0  0  1  0  1  0  0  0  0  0  1]
             ↓        ↓              ↓
             2        5              9

Lookup "alpha"

Hashes: {2, 5, 9}
Counters = {0, 1, 0}
β†’ At least one zero
β†’ Result: DEFINITELY NOT PRESENT

Idx:   0  1  2  3  4  5  6  7  8  9 10 11
A  =  [0  0  0  1  0  1  0  0  0  0  0  1]
             βœ—        βœ“              βœ—

Cuckoo Filter

Hash-table–style filter that stores short fingerprints in two possible buckets; supports insert, lookup, delete with low false-positive rates and high load factors.

Example inputs and outputs

Setup

$$ b = 8 \text{buckets}, \quad \text{bucket size} = 2, \quad \text{fingerprint size} = 8 \text{bits} $$

Inserted set:

$$ {\text{"cat"}, \text{"dog"}, \text{"eel"}} $$

Each element is stored as a short fingerprint in one of two candidate buckets.

Example 1

$$ \text{Query: contains("cat")} $$

Fingerprint for "cat" is present in one of its candidate buckets.

$$ \text{Output: maybe present (true positive)} $$

Example 2

$$ \text{Query: contains("fox")} $$

Fingerprint for "fox" is absent from both candidate buckets.

$$ \text{Output: definitely not present} $$

Example 3 (Deletion)

$$ \text{Operation: remove("dog")} $$

Fingerprint for "dog" is removed from its bucket.

$$ \text{Result: deletion supported directly by removing the fingerprint} $$

How it works

Each key x β†’ short fingerprint f = FP(x) Two candidate buckets:

Start (empty)

[0]: [ -- , -- ]   [1]: [ -- , -- ]   [2]: [ -- , -- ]   [3]: [ -- , -- ]
[4]: [ -- , -- ]   [5]: [ -- , -- ]   [6]: [ -- , -- ]   [7]: [ -- , -- ]

Insert "cat"

f = 0xA7
i1 = 1
i2 = 1 XOR H(0xA7) = 5

Bucket 1 has free slot β†’ place 0xA7 in [1]

[1]: [ A7 , -- ]

Insert "dog"

f = 0x3C
i1 = 5
i2 = 5 XOR H(0x3C) = 2

Bucket 5 has free slot β†’ place 0x3C in [5]

[1]: [ A7 , -- ]        [5]: [ 3C , -- ]

Insert "eel"

f = 0xD2
i1 = 1
i2 = 1 XOR H(0xD2) = 4

Bucket 1 has one free slot β†’ place 0xD2 in [1]

[1]: [ A7 , D2 ]        [5]: [ 3C , -- ]

Lookup "cat"

f = 0xA7
Buckets: i1 = 1, i2 = 5
Check: bucket[1] has A7 β†’ found

Result: MAYBE PRESENT

Lookup "fox"

f = 0x9B
i1 = 0
i2 = 0 XOR H(0x9B) = 7

Check buckets 0 and 7 β†’ fingerprint not found

Result: DEFINITELY NOT PRESENT

String Search Algorithms

Slide the pattern one position at a time over the text; at each shift compare characters left-to-right until a mismatch or a full match.

Example inputs and outputs

Example 1

$$ \text{Text: } "abracadabra", \quad \text{Pattern: } "abra" $$

$$ \text{Output: matches at indices } 0 \text{and} 7 $$

Example 2

$$ \text{Text: } "aaaaa", \quad \text{Pattern: } "aaa" $$

$$ \text{Output: matches at indices } 0, 1, 2 $$

How it works

Text (length 11):

Text:   a b r a c a d a b r  a
Idx:    0 1 2 3 4 5 6 7 8 9 10

Pattern (length 4):

Pattern: a b r a

Shift 0

Text:    a b r a
Pattern: a b r a

βœ… All match β†’ REPORT at index 0

Shift 1

Text:     b r a c
Pattern:  a b r a

❌ Mismatch at first char β†’ advance

Shift 2

Text:      r a c a
Pattern:   a b r a

❌ Mismatch β†’ advance

Shift 3

Text:       a c a d
Pattern:    a b r a

❌ Mismatch β†’ advance

Shift 4

Text:        c a d a
Pattern:     a b r a

❌ Mismatch β†’ advance

Shift 5

Text:         a d a b
Pattern:      a b r a

❌ Mismatch β†’ advance

Shift 6

Text:          d a b r
Pattern:       a b r a

❌ Mismatch β†’ advance

Shift 7

Text:           a b r a
Pattern:        a b r a

βœ… All match β†’ REPORT at index 7

Knuth–Morris–Pratt (KMP)

Precompute a table (LPS / prefix-function) for the pattern so that on a mismatch you β€œjump” the pattern to the longest proper prefix that is also a suffix, avoiding rechecks.

Example inputs and outputs

Example 1

$$ \text{Text: } "ababcabcabababd", \quad \text{Pattern: } "ababd" $$

$$ \text{Output: match at index } 10 $$

Example 2

$$ \text{Text: } "aaaaab", \quad \text{Pattern: } "aaab" $$

$$ \text{Output: match at index } 2 $$

How it works

We want to find the pattern "ababd" in the text "ababcabca babab d".

1) Precompute LPS (Longest Proper Prefix that is also a Suffix)

Pattern:

a   b   a   b   d
0   1   2   3   4   ← index

LPS array:

0   0   1   2   0

Meaning:

2) Scan Text with Two Pointers

Text:

a b a b c a b c a b a b a b d
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Pattern: a b a b d

Step A: Initial matches

i=0..3: "abab" matched β†’ j=4 points to 'd'

Step B: Mismatch at i=4

text[i=4] = 'c'
pattern[j=4] = 'd' β†’ mismatch

Instead of restarting, use LPS:

j = LPS[j-1] = LPS[3] = 2

So pattern jumps back to "ab" (no wasted text comparisons). i stays at 4.

Step C: Continue scanning

The algorithm keeps moving forward, reusing LPS whenever mismatches occur.

Step D: Full match found

At i=14, j advances to 5 (pattern length).

β†’ FULL MATCH found!
Start index = i - m + 1 = 14 - 5 + 1 = 10

βœ… Pattern "ababd" occurs in the text starting at index 10.

Boyer–Moore (BM)

Compare the pattern right-to-left; on a mismatch, skip ahead using bad-character and good-suffix rules so many text characters are never touched.

Example inputs and outputs

Example 1

$$ \text{Text: } "HERE IS A SIMPLE EXAMPLE", \quad \text{Pattern: } "EXAMPLE" $$

$$ \text{Output: match at index } 17 $$

Example 2

$$ \text{Text: } "NEEDLE IN A HAYSTACK", \quad \text{Pattern: } "STACK" $$

$$ \text{Output: match at index } 15 $$

How it works

Text (with spaces shown as _):

0        10        20        30
H E R E _ I S _ A _ S I M P L E _ E X A M P L E

Pattern: "EXAMPLE" (length = 7)

Step 1: Align pattern at text[10..16] = "SIMPLE"

Text:    ... S I M P L E ...
Pattern:       E X A M P L E
               ↑ (start comparing right β†’ left)

Compare right-to-left:

E=E, L=L, P=P, M=M, A=A, 
X vs I β†’ mismatch

Step 2: Shift until pattern under "EXAMPLE"

Text:    ... E X A M P L E
Pattern:     E X A M P L E

Compare right-to-left:

E=E, L=L, P=P, M=M, A=A, X=X, E=E

βœ… Full match found at index 17.

Rabin–Karp (RK)

Compare rolling hashes of the current text window and the pattern; only if hashes match do a direct character check (to rule out collisions).

Example inputs and outputs

Example 1

$$ \text{Text: } "ABCDABCABCD", \quad \text{Pattern: } "ABC" $$

$$ \text{Output: matches at indices } 0, 4, 7 $$

Example 2

$$ \text{Text: } "ABCDE", \quad \text{Pattern: } "FG" $$

$$ \text{Output: no match} $$

How it works

We’ll use the classic choices:

Text:    A B C D A B C A B C D
Index:   0 1 2 3 4 5 6 7 8 9 10
Pattern: A B C   (m = 3)

Precompute

pow = B^(m-1) mod M = 256^2 mod 101 = 88
HP = H(P) = H("ABC")

Start h=0, then for each char h = (B*h + val) mod M:

So HP = 59.

Rolling all windows

Initial window T[0..2]="ABC":

h0 = 59 (matches HP β†’ verify chars β†’ βœ… match at 0)

For rolling:

h_next = ( B * (h_curr βˆ’ val(left) * pow) + val(new) ) mod M

(If the inner term is negative, add M before multiplying.)

First two rolls

From $[0..2]$ "ABC" $(h0=59)$ β†’ $[1..3]$ "BCD":

left='A'(65), new='D'(68)
inner = (59 βˆ’ 65*88) mod 101 = (59 βˆ’ 5720) mod 101 = 96
h1 = (256*96 + 68) mod 101 = 24644 mod 101 = 0

From $[3..5]$ "DAB" $(h3=66)$ β†’ $[4..6]$ "ABC":

left='D'(68), new='C'(67)
inner = (66 βˆ’ 68*88) mod 101 = (66 βˆ’ 5984) mod 101 = 41
h4 = (256*41 + 67) mod 101 = 10563 mod 101 = 59  (= HP)

All windows (start index s)

s   window   hash   =HP?
0   ABC      59     βœ“  β†’ verify β†’ βœ… MATCH at 0
1   BCD       0
2   CDA      38
3   DAB      66
4   ABC      59     βœ“  β†’ verify β†’ βœ… MATCH at 4
5   BCA      98
6   CAB      79
7   ABC      59     βœ“  β†’ verify β†’ βœ… MATCH at 7
8   BCD       0

Matches at indices: 0, 4, 7.

Table of Contents

    Searching
    1. Linear & Sequential Search
      1. Linear Search
      2. Sentinel Linear Search
    2. Divide & Conquer Search
      1. Binary Search
      2. Ternary Search
      3. Jump Search
      4. Exponential Search
      5. Interpolation Search
    3. Hash-based Search
      1. Hash Table Search
      2. Open Addressing β€” Linear Probing
      3. Open Addressing β€” Quadratic Probing
      4. Open Addressing β€” Double Hashing
      5. Separate Chaining
      6. Cuckoo Hashing
    4. Probabilistic & Approximate Search
      1. Bloom Filter
      2. Counting Bloom Filter
      3. Cuckoo Filter
    5. String Search Algorithms
      1. Naive String Search
      2. Knuth–Morris–Pratt (KMP)
      3. Boyer–Moore (BM)
      4. Rabin–Karp (RK)