Last modified: January 02, 2026

This article is written in: 🇺🇸

Introduction to Data Structures & Algorithms

Data structures and algorithms are fundamental concepts in computer science and they are the only way to write efficient software.

Most “slow code” isn’t slow because the programmer typed badly, it’s slow because the program is doing the wrong kind of work for the size of the input. Data structures and algorithms are the two knobs you can turn to fix that. Once you see them as practical tools (not academic trivia), the whole topic becomes less intimidating and a lot more useful.

Data Structures

A data structure organizes and stores data in a way that allows efficient access, modification, and processing. The choice of the appropriate data structure depends on the specific use case and can significantly impact the performance of an application. Here are some common data structures:

A useful way to think about data structures is: they’re not just “ways to store data,” they’re promises. An array promises fast indexing. A stack promises “last thing in is the first thing out.” A tree promises hierarchy. When you pick the structure whose promise matches your problem, the solution often becomes simpler and faster at the same time.

I. Array

Imagine an array as a row of lockers, each labeled with a number and capable of holding one item of the same type. Technically, arrays are blocks of memory storing elements sequentially, allowing quick access using an index. However, arrays have a fixed size, which limits their flexibility when you need to add or remove items.

A “do” with arrays: use them when you need quick random access by index, or when you’re scanning data in order. A “don’t”: assume insertions and deletions in the middle are cheap, shifting elements can turn a clean-looking solution into a slow one when the array is large.

Indices:  0   1   2   3
Array:   [A] [B] [C] [D]

II. Stack

Think of a stack like stacking plates: you always add new plates on top (push), and remove them from the top as well (pop). This structure follows the Last-In, First-Out (LIFO) approach, meaning the most recently added item is removed first. Stacks are particularly helpful in managing function calls (like in the call stack of a program) or enabling "undo" operations in applications.

Stacks shine when your problem has a “most recent context” feel: undo/redo, parsing, backtracking, evaluating expressions, matching parentheses. The “do” is to lean on the LIFO rule to avoid complicated bookkeeping. The “don’t” is to use a stack when you actually need “oldest first”, that’s a queue.

┌───┐
 │ C │  ← most recent (pop/push here)
 ├───┤
 │ B │
 ├───┤
 │ A │
 └───┘
Bottom

III. Queue

A queue is similar to a line at the grocery store checkout. People join at the end (enqueue) and leave from the front (dequeue), adhering to the First-In, First-Out (FIFO) principle. This ensures the first person (or item) that arrives is also the first to leave. Queues work great for handling tasks or events in the exact order they occur, like scheduling print jobs or processing messages.

Queues are your go-to when fairness and order matter: job scheduling, buffering, breadth-first search, producer/consumer pipelines. A solid “do” is to use a queue when the problem sounds like “process things in the order they arrived.” A common “don’t” is implementing a queue with an array in a way that forces shifting on every dequeue, use a deque or circular buffer instead.

Front → [A] → [B] → [C] → [D] ← Rear
(dequeue)                 (enqueue)

IV. Linked List

You can picture a linked list as a treasure hunt, where each clue leads you to the next one. Each clue, or node, holds data and a pointer directing you to the next node. Because nodes can be added or removed without shifting other elements around, linked lists offer dynamic and flexible management of data at any position.

Linked lists are great when you need frequent insertions or deletions once you already have the node, but they’re not great at random access. The “do” is to use them when you’re walking forward and rewiring pointers. The “don’t” is to treat them like arrays, finding “the 10,000th element” is slow because you have to traverse there step by step.

Head -> [A] -> [B] -> [C] -> NULL

V. Tree

A tree resembles a family tree, starting from one ancestor (the root) and branching out into multiple descendants (nodes), each of which can have their own children. Formally, trees are hierarchical structures organized across various levels. They’re excellent for showing hierarchical relationships, such as organizing files on your computer or visualizing company structures.

Trees matter because many real problems are hierarchical even when they don’t look like it at first: file systems, DOM structures, organization charts, decision processes, indexes. The “do” is to use tree traversals (pre/in/post/level-order) so your logic stays systematic. The “don’t” is to forget that tree shape affects performance, an unbalanced tree can quietly turn fast operations into slow ones.

# Tree
        (Root)
        /    \
     (L)     (R)
    /  \       \
 (LL) (LR)     (RR)

VI. Graph

Consider a graph like a network of cities connected by roads. Each city represents a node, and the roads connecting them are edges, which can either be one-way (directed) or two-way (undirected). Graphs effectively illustrate complex relationships and networks, such as social media connections, website link structures, or even mapping transportation routes.

Graphs are the “everything is connected” structure. The moment you see relationships that aren’t strictly hierarchical, friends, links, routes, dependencies, you’re in graph territory. The “do” is to ask early: is this graph directed or undirected, weighted or unweighted? That choice decides whether BFS, DFS, Dijkstra, or something else is the right move. The “don’t” is to ignore direction or weights and then wonder why the result feels wrong.

(A) ↔ (B)
 |       \
(C) ---> (D)

(↔ undirected edge, ---> directed edge)

image

Algorithms

An algorithm is like a clear and detailed set of instructions or steps for solving a specific problem or performing a particular task. Think of it like following a precise recipe when cooking:

The reason this “recipe” framing sticks is that good algorithms are repeatable and dependable: the same inputs give the same output, and the steps don’t depend on magic. When you’re debugging or optimizing, that predictability is everything, it’s what lets you reason about behavior instead of guessing.

To evaluate how good an algorithm is, we often look at its efficiency in terms of time complexity (how long it takes to run) and space complexity (how much memory it uses). We will discuss it in greater detail in later sections.

A quick “do and don’t” here: do focus on clarity first, an algorithm you can’t explain is hard to trust. Don’t confuse a clever trick with an algorithmic improvement; sometimes the biggest win is choosing a better approach, not writing tighter code.

Algorithms vs. Programs

An algorithm is a high-level blueprint for solving a specific problem. It is abstract, language-independent, and specifies a clear sequence of steps without relying on any particular programming syntax. An algorithm can be thought of as a recipe or method for solving a problem and can be represented in multiple forms, such as plain text or a flowchart.

The distinction matters because you can evaluate an algorithm before you write code: does it terminate, does it cover edge cases, what’s its complexity, how does it scale? That’s a superpower in interviews and in real projects, design first, implement second.

Example: Algorithm for adding two numbers:

Step 1: Start
Step 2: Declare variables num1, num2, and sum.
Step 3: Read values into num1 and num2.
Step 4: Add num1 and num2; store the result in sum.
Step 5: Print sum.
Step 6: Stop.

This algorithm can also be visualized using a flowchart:

|       Start       |
---------------------
         |
         V
------------------------------
| Declare num1, num2, sum    |
------------------------------
         |
         V
--------------------------
| Read num1 and num2     |
--------------------------
         |
         V
--------------------------
| sum = num1 + num2      |
--------------------------
         |
         V
--------------------------
| Print sum              |
--------------------------
         |
         V
--------------------------
| Stop                   |
--------------------------

In contrast, a program is a concrete implementation of an algorithm. It is language-dependent and adheres to the specific syntax and rules of a programming language. For example, the above algorithm can be implemented in Python as:

num1 = int(input("Enter first number: "))
num2 = int(input("Enter second number: "))
sum = num1 + num2
print("The sum is", sum)

Programs may sometimes run indefinitely or until an external action stops them. For instance, an operating system is a program designed to run continuously until explicitly terminated.

A good practical habit: when you write a program, keep the algorithm “visible” in the structure of the code, clear function names, logical steps, clean invariants. That makes debugging and optimization feel like adjusting a plan, not untangling a knot.

Types of Algorithms

Algorithms can be classified into various types based on the problems they solve and the strategies they use. Here are some common categories with consistent explanations and examples:

This classification helps because it turns “a million problems” into “a few families.” When you recognize the family, you can reuse known techniques instead of reinventing solutions. The “do” is to build pattern recognition. The “don’t” is to memorize implementations without understanding what problem shape they fit.

I. Sorting Algorithms arrange data in a specific order, such as ascending or descending. Examples include bubble sort, insertion sort, selection sort, and merge sort.

Sorting is often the sneaky first step that makes everything else easier: once data is ordered, you can use binary search, two pointers, sweeping scans, and duplicate skipping. The “do” is to ask: am I allowed to reorder the data? The “don’t” is to sort blindly when order matters or when a hash-based approach would be cheaper.

Example: Bubble Sort

Initial Array: [5, 3, 8, 4, 2]

Steps:

  1. Compare adjacent elements and swap if needed.
  2. Repeat for all elements.
After 1st Pass: [3, 5, 4, 2, 8]
After 2nd Pass: [3, 4, 2, 5, 8]
After 3rd Pass: [3, 2, 4, 5, 8]
After 4th Pass: [2, 3, 4, 5, 8] (Sorted)

II. Search Algorithms are designed to find a specific item or value within a collection of data. Examples include linear search, binary search, and depth-first search.

Searching is about trade-offs: linear search is simple and often fine for small inputs; binary search is fast but needs sorted data; DFS/BFS are “search” across relationships rather than lists. The “do” is to match the search method to the structure. The “don’t” is to use binary search without guaranteeing sortedness.

Example: Binary Search

Searching 33 in Sorted Array: [1, 3, 5, 7, 9, 11, 33, 45, 77, 89]

Steps:

  1. Start with the middle element.
  2. If the middle element is the target, return it.
  3. If the target is greater, ignore the left half.
  4. If the target is smaller, ignore the right half.
  5. Repeat until the target is found or the subarray is empty.
Mid element at start: 9
33 > 9, so discard left half
New mid element: 45
33 < 45, so discard right half
New mid element: 11
33 > 11, so discard left half
The remaining element is 33, which is the target.

III. Graph Algorithms address problems related to graphs, such as finding the shortest path between nodes or determining if a graph is connected. Examples include Dijkstra's algorithm and the Floyd-Warshall algorithm.

Graph algorithms matter because graphs model real systems: routes, networks, dependencies, recommendations. The “do” is to pin down the graph type first (directed? weighted? cyclic?). The “don’t” is to treat all shortest paths the same, unweighted shortest path is BFS, but weighted shortest path often needs Dijkstra (or Bellman-Ford if negative edges exist).

Example: Dijkstra's Algorithm

Given a graph with weighted edges, find the shortest path from a starting node to all other nodes.

Steps:

  1. Initialize the starting node with a distance of 0 and all other nodes with infinity.
  2. Visit the unvisited node with the smallest known distance.
  3. Update the distances of its neighboring nodes.
  4. Repeat until all nodes have been visited.

Example Graph:

A -> B (1)
A -> C (4)
B -> C (2)
B -> D (5)
C -> D (1)

Trace Table

Iter Extracted Node (u) PQ before extraction dist[A,B,C,D] prev[A,B,C,D] Visited Comments / Updates
0 , (initial) (0, A) [0, ∞, ∞, ∞] [-, -, -, -] {} Initialization: A=0, others ∞
1 A (0) (0, A) [0, 1, 4, ∞] [-, A, A, -] {A} Relax A→B (1), A→C (4); push (1,B), (4,C)
2 B (1) (1, B), (4, C) [0, 1, 3, 6] [-, A, B, B] {A, B} Relax B→C: alt=3 <4 ⇒ update C; B→D: dist[D]=6; push (3,C), (6,D). (4,C) becomes stale
3 C (3) (3, C), (4, C) stale, (6, D) [0, 1, 3, 4] [-, A, B, C] {A, B, C} Relax C→D: alt=4 <6 ⇒ update D; push (4,D). (6,D) becomes stale
4 D (4) (4, D), (4, C) stale, (6, D) stale [0, 1, 3, 4] [-, A, B, C] {A,B,C,D} No outgoing improvements; done

Legend:

Starting from A:

IV. String Algorithms deal with problems related to strings, such as finding patterns or matching sequences. Examples include the Knuth-Morris-Pratt (KMP) algorithm and the Boyer-Moore algorithm.

String algorithms are common because text is everywhere: search bars, logs, DNA sequences, code, messages. The “do” is to watch for repeated work, naive substring checks can be painfully slow. The “don’t” is to reach for regex as a hammer for every nail; it’s powerful, but certain patterns can be unexpectedly expensive.

Example: Boyer-Moore Algorithm

Text:    "ABABDABACDABABCABAB"
Pattern: "ABABCABAB"

Steps:

  1. Compare the pattern from right to left.
  2. If a mismatch occurs, use the bad character and good suffix heuristics to skip alignments.
  3. Repeat until the pattern is found or the text is exhausted.

Iter Start Text window Mismatch (pattern vs text) Shift applied Next Start Result
1 0 ABABDABAC pattern[8]=B vs text[8]=C bad char C → last in pattern at idx4 ⇒ 8−4 = 4 4 no match
2 4 DABACDABA pattern[8]=B vs text[12]=A bad char A → last at idx7 ⇒ 8−7 = 1 5 no match
3 5 ABACDABAB pattern[4]=C vs text[9]=D D not in pattern ⇒ 4−(−1)= 5 10 no match
4 10 ABABCABAB full right-to-left comparison → match , , found at 10

Pattern matched starting at index 10 in the text.

Important Algorithms for Software Engineers

This is the part many people miss: in real work, you’re rarely rewarded for reimplementing Dijkstra from memory. You’re rewarded for knowing that shortest paths is the right framing, picking the right variant, estimating cost, and using a reliable implementation. The “do” is to become fluent in selection and trade-offs. The “don’t” is to confuse “I can code it from scratch” with “I can solve the real problem.”

Real Life Story:

When Zara landed her first job at a logistics-tech startup, her assignment was to route delivery vans through a sprawling city in under a second, something she’d never tackled before.  She remembered the semester she’d wrestled with graph theory and Dijkstra’s algorithm purely for practice, so instead of hand-coding the logic she opened the company’s Python stack and pulled in NetworkX, benchmarking its built-in shortest-path routines against the map’s size and the firm’s latency budget.  The initial results were sluggish, so she compared A* with Dijkstra, toggling heuristics until the run time dipped below 500 ms, well under the one-second target.  Her teammates were impressed not because she reinvented an algorithm, but because she knew which one to choose, how to reason about its complexity, and where to find a rock-solid library implementation.  Later, in a sprint retrospective, Zara admitted that mastering algorithms in college hadn’t been about memorizing code, it had trained her to dissect problems, weigh trade-offs, and plug in the right tool when every millisecond and memory block counted.

Understanding Algorithmic Complexity

Algorithmic complexity helps us understand the computational resources (time or space) an algorithm needs as the input size increases. Here’s a breakdown of different types of complexity:

Complexity is the “budgeting” system for software. You don’t need exact microseconds to make good decisions, you need to know how costs grow when data grows. The “do” is to think: if input doubles, what happens? The “don’t” is to be fooled by small tests; many algorithms look fine on tiny inputs and collapse at scale.

Analyzing Algorithm Growth Rates

Understanding how the running time or space complexity of an algorithm scales with increasing input size is pivotal in algorithm analysis. To describe this rate of growth, we employ several mathematical notations that offer insights into the algorithm's efficiency under different conditions.

These notations are less about fancy math and more about communicating clearly. When someone says “this is $O(n \log n)$,” they’re telling you how it behaves as you scale, and whether it’s likely to stay usable when today’s dataset becomes tomorrow’s dataset.

Big O Notation (O-notation)

The Big O notation represents an asymptotic upper bound, indicating the worst-case scenario for an algorithm's time or space complexity. Essentially, it signifies an upper limit on the growth of a function.

If we designate $f(n)$ as the actual complexity and $g(n)$ as the function in Big O notation, stating $f(n) = O(g(n))$ implies that $f(n)$, the time or space complexity of the algorithm, grows no faster than $g(n)$.

For instance, if an algorithm has a time complexity of $O(n)$, it signifies that the algorithm's running time does not grow more rapidly than a linear function of the input size, in the worst-case scenario.

0902bace-952d-4c80-9533-5706e28ef3e9

Big Omega Notation (Ω-notation)

The Big Omega notation provides an asymptotic lower bound that expresses the best-case scenario for the time or space complexity of an algorithm.

If $f(n) = Ω(g(n))$, this means that $f(n)$ grows at a rate that is at least as fast as $g(n)$. In other words, $f(n)$ does not grow slower than $g(n)$.

For example, if an algorithm has a time complexity of $Ω(n)$, it implies that the running time is at the bare minimum proportional to the input size in the best-case scenario.

d189ece7-e9c2-4797-8e0d-720336c4ba4a

Theta Notation (Θ-notation)

Theta notation offers a representation of the average-case scenario for an algorithm's time or space complexity. It sets an asymptotically tight bound, implying that the function grows neither more rapidly nor slower than the bound.

Stating $f(n) = Θ(g(n))$ signifies that $f(n)$ grows at the same rate as $g(n)$ under average circumstances. This indicates the time or space complexity is both at most and at least a linear function of the input size.

ef39373a-8e6a-4e5b-832f-698b4dde7c7e

These notations primarily address the growth rate as the input size becomes significantly large. While they offer a high-level comprehension of an algorithm's performance, the actual running time in practice can differ based on various factors, such as the specific input data, the hardware or environment where the algorithm is operating, and the precise way the algorithm is implemented in the code.

Diving into Big O Notation Examples

Big O notation is a practical tool for comparing the worst-case scenario of algorithm complexities. Here are examples of various complexities:

The point of this list isn’t to memorize it like a chant, it’s to build intuition. When you see nested loops, your brain should whisper “quadratic?” When you see repeated halving, it should whisper “logarithmic?” That intuition is what helps you spot performance problems early.

The graph below illustrates the growth of these different time complexities:

big_o

Imagine you’re building an app and every millisecond counts, choosing the right algorithm can make it lightning-fast or tediously slow, so having a solid grasp of time complexity may pay off.

Here is a summary cheat sheet:

Notation Name Meaning Common Examples
$O(1)$ Constant time Running time does not depend on input size $n$. Array indexing, hash‐table lookup
$O(\log n)$ Logarithmic time Time grows proportionally to the logarithm of $n$. Binary search, operations on balanced BSTs
$O(n)$ Linear time Time grows linearly with $n$. Single loop over array, scanning for max/min
$O(n \log n)$ Linearithmic time Combination of linear and logarithmic growth. Merge sort, heap sort, FFT
$O(n^2)$ Quadratic time Time grows proportional to the square of $n$. Bubble sort, selection sort, nested loops
$O(n^3)$ Cubic time Time grows proportional to the cube of $n$. Naïve matrix multiplication (3 nested loops)
$O(2^n)$ Exponential time Time doubles with each additional element in the input. Recursive Fibonacci, brute‐force subset enumeration
$O(n!)$ Factorial time Time grows factorially with $n$. Brute‐force permutation generation, TSP brute‐force

Interpreting Big O Notation

The main “do” here is to treat Big O as a communication tool. It lets you compare approaches and explain choices to others. The main “don’t” is to weaponize it, “this is $O(n)$” doesn’t automatically beat “this is $O(n \log n)$” if constants, data sizes, or real constraints tell a different story.

Can every problem have an O(1) algorithm?

There is a common beginner fantasy: “surely there’s always a constant-time trick.” Sometimes there isn’t, and that’s not a failure, it’s a reality of computation. The “do” is to accept lower bounds and design within them. The “don’t” is to chase miraculous optimizations when the problem inherently requires reading the input.

Recognising $O(\log n)$ and $O(n \log n)$ running-times

The growth rate of an algorithm almost always comes from how quickly the remaining work shrinks as the algorithm executes. Two common patterns are:

This is one of the most useful instincts you can build: look at the loop, ask what variable is changing, and ask whether it’s shrinking by subtraction (linear) or division (logarithmic). If you can do that, you can “feel” complexity without formal proofs.

Pattern Typical loop behaviour Resulting time-complexity
Halve (or otherwise divide) the problem each step $n \to n/2 \to n/4 \dots$ $\Theta(\log n)$
Do a linear amount of work, but each unit of work is itself logarithmic outer loop counts down one by one, inner loop halves $\Theta(n \log n)$

Below are four miniature algorithms written in language-neutral pseudocode (no Python syntax), followed by the intuition behind each bound.

I. Linear - $\Theta(n)$

procedure Linear(n)
    while n > 0 do
        n ← n − 1
    end while
end procedure

Work left drops by 1 each pass, so the loop executes exactly $n$ times.

II. Logarithmic - $\Theta(\log n)$

procedure Logarithmic(n)
    while n > 0 do
        n ← ⌊n / 2⌋          ▹ halve the problem
    end while
end procedure

Each pass discards half of the remaining input, so only $\lfloor\log_2 n\rfloor + 1$ iterations are needed.

Common real examples: binary search, finding the height of a complete binary tree.

III. Linear-logarithmic - $\Theta(n \log n)$

procedure LinearLogarithmic(n)
    m ← n
    while m > 0 do                ▹ runs n times
        k ← n
        while k > 0 do            ▹ runs log n times
            k ← ⌊k / 2⌋
        end while
        m ← m − 1
    end while
end procedure

Classic real-world instances: mergesort, heapsort, many divide-and-conquer algorithms, building a heap then doing $n$ delete-min operations.

IV. Squared-logarithmic - $\Theta(\log^2 n)$

procedure LogSquared(n)
    m ← n
    while m > 0 do                ▹ outer loop: log n times
        k ← n
        while k > 0 do            ▹ inner loop: log n times
            k ← ⌊k / 2⌋
        end while
        m ← ⌊m / 2⌋
    end while
end procedure

Both loops cut their control variable in half, so each contributes a $\log n$ factor, giving $\log^2 n$. Such bounds appear in some advanced data-structures (e.g., range trees) where two independent logarithmic dimensions are traversed.

Rules of thumb:

  1. Log factors come from repeatedly shrinking a quantity by a constant factor. Any loop of the form while x > 1: x \gets x / c (for constant $c > 1$) takes $\Theta(\log x)$ steps.
  2. Multiplying two independent loops multiplies their costs. An outer loop that counts $n$ times and an inner loop that counts $\log n$ times gives $n \cdot \log n$.
  3. Divide-and-conquer often yields $n \log n$. Splitting the problem into a constant number of sub-problems of half size and doing $\Theta(n)$ work to combine them recurs to the Master Theorem case $T(n) = 2,T\bigl(n/2\bigr) + \Theta(n) = \Theta(n \log n).$
  4. Nested logarithmic loops stack. Two independent halving loops give $\log^2 n$; three give $\log^3 n$, and so on.

A “do” for interviews and real design reviews: walk through this reasoning out loud. It shows you understand growth, not just symbols. A “don’t” is to overfit to the notation, always tie the bound back to the loop behavior.

Misconceptions

These misconceptions are worth calling out because they keep people from learning the useful parts. The goal isn’t to become a theoretician; it’s to become someone who can write software that keeps working as the world scales. That’s why a little complexity intuition pays off so heavily.