Last modified: January 19, 2025
This article is written in: 🇺🇸
Graphs
In many areas of life, we come across systems where elements are deeply interconnected—whether through physical routes, digital networks, or abstract relationships. Graphs offer a flexible way to represent and make sense of these connections.
Some real-world examples include:
- Underground tunnel networks (subways and transportation systems below the city surface)
- Railway maps (train routes connecting different towns and cities)
- Cities linked by flights (air travel routes between global destinations)
- Networks of pipes (piping systems transporting water, gas, or oil)
- Electrical grids (networks distributing electricity across regions)
- Carbon atoms in a molecule (chemical compounds and the bonds between their constituent atoms)
- Internet (webpages interlinked through hyperlinks or networks of computers)
- Task scheduling (dependencies among tasks that determine their sequence or priority)
- Spread of a disease (understanding how diseases propagate through populations)
- Social networks (people connected through friendships, family ties, or professional relationships)
- Countries and their political alliances (diplomatic ties, trade partnerships, or defense pacts between nations)
Viewing these systems as graphs allows for deeper analysis, better optimization, and more accurate predictions of their behavior. In computer science and mathematics, graphs serve as a powerful data structure that provides the tools needed for this kind of modeling.
Graph Terminology
Graph theory has its own language, full of terms that make it easier to talk about and define different types of graphs and their features. Here's an explanation of some important terms:
- A graph $G$ is a mathematical structure consisting of a set $V(G)$ of vertices (also known as points or nodes) and a set $E(G)$ of pairs of these vertices. These pairs, denoted as ${x, y} \in E(G)$, are called edges (or links or lines).
- If ${x, y} \in E(G)$, then $x$ and $y$ are said to be adjacent. The number of vertices adjacent to a vertex $v$ defines its degree. Interestingly, the total degrees of all vertices in a graph is always even.
- A path of length $n$ is a series of vertices $v_1 \sim v_2 \sim \dots \sim v_{n+1}$ such that $(v_i, v_{i+1}) \in E(G)$ with all vertices in the sequence being distinct.
- A cycle is a special type of path where $v_1 = v_{n+1}$, and all other vertices from $v_1$ to $v_n$ are unique.
- Distance represents the length of the shortest path between two vertices. When assessing the distance between vertices $x$ and $y$, you are looking for the most direct route connecting them.
- A simple graph is a graph with no self-loops (edges connected at both ends to the same vertex) and at most one edge between any two vertices.
- In a directed graph (digraph), edges have direction. They are an ordered pair of vertices, often called arcs. In contrast, undirected graphs lack directionality, meaning an edge from vertex $A$ to vertex $B$ is the same as one from $B$ to $A$.
- A weighted graph assigns a weight to each edge, typically a non-negative integer. A binary weight represents the presence (1) or absence (0) of a connection between nodes. A numeric weight indicates the strength or cost of a connection. A normalized weight ensures that all outgoing edges from a node sum to one, often used in probabilistic contexts.
- In terms of connectivity, an undirected graph is termed connected if every vertex pair is linked by some path. In a directed graph, weakly connected means a path exists from vertex $A$ to $B$ or $B$ to $A$ for any vertices $A$ and $B$. Strongly connected means paths exist in both directions between every pair of vertices.
- Vertices $A$ and $B$, if connected by an edge $e$, are neighbors. This edge $e$ is incident to both $A$ and $B$. When two edges share a vertex (like edges from $A$ to $B$ and $B$ to $C$), they are adjacent.
- The concept of an isolated vertex refers to a vertex with no edges connected to it. This means the vertex has a degree of zero since it is not adjacent to any other vertex.
- Subgraphs are smaller graphs formed from a subset of the vertices and edges of a larger graph. If $G$ is a graph, a subgraph $H$ is a graph such that the vertex set of $H$ is a subset of the vertex set of $G$ and the edge set of $H$ is a subset of the edge set of $G$.
- A spanning tree of a graph $G$ is a subgraph that includes all the vertices of $G$ and is a single connected tree. A spanning tree has exactly $|V| - 1$ edges where $|V|$ is the number of vertices in the graph.
- Bipartite graphs are graphs whose vertices can be divided into two disjoint sets such that no two vertices within the same set are adjacent. Formally, a graph $G$ is bipartite if there exists a partition $V(G) = V_1 \cup V_2$ where $V_1$ and $V_2$ are disjoint and every edge connects a vertex in $V_1$ to a vertex in $V_2$.
- A complete graph is a simple graph in which there is an edge between every pair of vertices. If a complete graph has $n$ vertices, it is denoted by $K_n$ and has $\frac{n(n-1)}{2}$ edges.
- Planar graphs can be drawn on a plane without any edges crossing. A graph is planar if it can be embedded in the plane, meaning it can be drawn on a flat surface such that its edges intersect only at their endpoints.
- The Eulerian path is a path in a graph that visits every edge exactly once. If such a path exists and starts and ends at different vertices, it is an Eulerian path. If it starts and ends at the same vertex, it is an Eulerian circuit. A graph has an Eulerian circuit if and only if all vertices have an even degree, and it is connected. A graph has an Eulerian path if it has exactly zero or two vertices of odd degree, and it is connected.
- The Hamiltonian path is a path in a graph that visits every vertex exactly once. If such a path exists and starts and ends at different vertices, it is a Hamiltonian path. If it starts and ends at the same vertex, it is a Hamiltonian circuit. Determining whether a Hamiltonian path or circuit exists in a graph is a well-known NP-complete problem.
- Graph isomorphism refers to a condition where two graphs $G$ and $H$ are considered isomorphic if there is a one-to-one correspondence between their vertex sets and their edge sets that preserves adjacency. In other words, $G$ and $H$ are structurally identical but may have different vertex and edge labels.
- Degree sequence is a list of the degrees of the vertices in a graph, usually written in non-increasing order. The degree sequence is a graph invariant, meaning it is the same for any isomorphic graphs.
- Graph coloring is the assignment of labels (colors) to vertices of a graph such that no two adjacent vertices share the same color. The minimum number of colors needed to color a graph $G$ is known as its chromatic number.
- Tree is a connected, acyclic graph. In a tree, any two vertices are connected by exactly one path, and adding any edge to a tree will create a cycle. Trees have several important properties and applications in computer science, particularly in data structures and algorithms.
- Forest is a disjoint set of trees. It is an acyclic graph that may consist of multiple disconnected components, each being a tree.
- Cut vertices and cut edges refer to vertices and edges whose removal increases the number of connected components of the graph. A cut vertex (or articulation point) is a vertex that, if removed, would split the graph into two or more disconnected subgraphs. A cut edge (or bridge) is an edge that, if removed, would increase the number of connected components.
- Matching in a graph is a set of edges without common vertices. A maximum matching is a matching that contains the largest possible number of edges. If every vertex of the graph is incident to exactly one edge of the matching, it is called a perfect matching.
- Independent set is a set of vertices in a graph, no two of which are adjacent. The size of the largest independent set is called the independence number of the graph.
- Clique is a subset of vertices such that every two distinct vertices are adjacent. The size of the largest clique in a graph is called the clique number.
- Vertex cover is a set of vertices such that each edge of the graph is incident to at least one vertex of the set. The size of the smallest vertex cover is called the vertex cover number.
- Dominating set is a set of vertices such that each vertex in the graph is either in the dominating set or adjacent to a vertex in the dominating set. The size of the smallest dominating set is called the domination number.
- Planarity testing determines whether a graph can be drawn in a plane without edge crossings. Kuratowski's theorem provides a criterion for planarity based on the presence of subgraphs homeomorphic to $K_5$ (complete graph on five vertices) or $K_{3,3}$ (complete bipartite graph on six vertices, three in each set).
- Graph algorithms encompass a variety of procedures and techniques used to solve problems related to graphs. Examples include depth-first search (DFS), breadth-first search (BFS), Dijkstra's algorithm for shortest paths, Kruskal's and Prim's algorithms for minimum spanning trees, and the Bellman-Ford algorithm for shortest paths in graphs with negative weights.
Representation of Graphs in Computer Memory
Graphs, with their versatile applications in numerous domains, necessitate efficient storage and manipulation mechanisms in computer memory. The choice of representation often depends on the graph's characteristics, such as sparsity, and the specific operations to be performed. Among the various methods available, the adjacency matrix and the adjacency list are the most prevalent.
Adjacency Matrix
An adjacency matrix represents a graph $G$ as a two-dimensional matrix. Given $V$ vertices, it utilizes a $V \times V$ matrix $A$. The rows and columns correspond to the graph's vertices, and each cell $A_{ij}$ holds:
1
if there is an edge between vertex $i$ and vertex $j$0
if no such edge exists
For graphs with edge weights, $A_{ij}$ contains the weight of the edge between vertices $i$ and $j$.
Example:
A | B | C | D | |
A | 0 | 1 | 0 | 1 |
B | 1 | 0 | 1 | 0 |
C | 0 | 1 | 0 | 1 |
D | 1 | 0 | 1 | 0 |
Here, the matrix indicates a graph with vertices A to D. For instance, vertex A connects with vertices B and D, hence the respective 1s in the matrix.
Benefits:
- Fixed-time ($O(1)$) edge existence checks.
- Particularly suitable for dense graphs, where the edge-to-vertex ratio is high.
Drawbacks:
- Consumes more space for sparse graphs.
- Traversing neighbors can be slower due to the need to check all vertices.
Adjacency List
An adjacency list uses a collection (often an array or a linked list) to catalog the neighbors of each vertex. Each vertex points to its own list, enumerating its direct neighbors.
Example:
A -> [B, D]
B -> [A, C]
C -> [B, D]
D -> [A, C]
This list reflects the same graph as our matrix example. Vertex A's neighbors, for instance, are B and D.
Benefits:
- Space-efficient for sparse graphs, where edges are relatively fewer.
- Facilitates faster traversal of a vertex's neighbors since the direct neighbors are listed without extraneous checks.
Drawbacks:
- Edge existence checks can take up to $O(V)$ time in the worst case.
- Potentially consumes more space for dense graphs.
In practice, the choice between these (and other) representations often hinges on the graph's characteristics and the specific tasks or operations envisioned.
Planarity
Planarity is a fundamental concept in graph theory that examines whether a graph can be drawn on a flat surface without any of its edges overlapping. This idea holds significant importance in areas such as circuit design, urban planning, and geography.
What is a Planar Graph?
A graph is considered "planar" if there exists a representation of it on a two-dimensional plane where its edges intersect only at their vertices and nowhere in between. Being able to redraw a graph without any edges crossing, even if it starts out messy with overlaps, is what shows if it’s planar or not.
Planar Embedding
A "planar embedding" refers to a way of drawing a graph on a plane so that none of its edges cross each other. If a graph is first drawn with overlapping edges, it’s considered planar if you can rearrange it so no edges cross.
Examples
To understand planarity better, let's explore some examples:
- Cycle Graphs: Graphs like triangles, squares, or hexagons, which are simple loops or cycles, are intrinsically planar. These shapes can always be illustrated without intersecting edges.
A-----B
| |
C-----D
- Complete Graph with Four Vertices ($K_4$): This graph has every vertex connected to all other vertices. Despite its complexity, $K_4$ remains planar and resembles a tetrahedron.
#
A--------
/ \ |
B---C |
\ / |
D--------
- Complete Graph with Five Vertices ($K_5$): Unlike $K_4$, $K_5$ cannot be sketched without crossing edges, thus classifying it as non-planar.
#
- A -
/ / \ \
/ | | \
B----+-----+---C
\ | | /
\ D-----E /
In the $K_5$ graph, edges like AD and AE overlap with BC.
Strategies for Assessing Planarity
- The planarity of a graph refers to whether it can be drawn on a flat surface without any edges crossing each other.
- Small graphs can be tested for planarity by manually rearranging their vertices and edges to check if a crossing-free drawing is possible.
- Kuratowski's theorem states that a graph is planar if it does not contain a subgraph that can be transformed into $K_5$ (a graph with five vertices all connected to each other) or $K_{3,3}$ (a graph with two groups of three vertices, where every vertex in one group connects to every vertex in the other).
- $K_5$ is a complete graph with five vertices where every pair of vertices has a direct edge connecting them.
- $K_{3,3}$ is a bipartite graph where two sets of three vertices are connected such that each vertex in the first set is connected to all vertices in the second set, with no edges within the same set.
- Wagner’s theorem provides an alternative way to determine planarity, stating that a graph is planar if it does not have $K_5$ or $K_{3,3}$ as a "minor." A minor is a smaller graph formed by deleting edges, deleting vertices, or merging connected vertices.
- For larger graphs, manual testing becomes impractical, and planarity algorithms are often used instead.
- The Hopcroft-Tarjan algorithm is a linear-time method for testing planarity. It uses depth-first search to efficiently decide if a graph can be drawn without crossing edges.
- The Boyer-Myrvold algorithm is another linear-time approach that tests planarity and can also provide an embedding of the graph (a specific way to draw it without crossings) if it is planar.
- Both algorithms are widely used in computer science for applications that involve networks, circuit design, and data visualization, where planarity helps simplify complex structures.
Traversals
- When we traverse a graph, we visit its vertices in an organized way to make sure we don’t miss any vertices or edges.
- Graphs, unlike trees, don’t have a single starting point like a root. This means we either need to be given a starting vertex or pick one randomly.
- Let’s say we start from a specific vertex, like $i$. From there, the traversal explores all connected vertices according to the rules of the chosen method.
- In both breadth-first search (BFS) and depth-first search (DFS), the order of visiting vertices depends on how the algorithm is implemented.
- For example, if the starting vertex A has three neighbors, like $C, F,$ and $G$, the algorithm doesn’t have a fixed rule for which neighbor to visit first. It could choose any of them based on the way it’s programmed.
- Because of this flexibility, we talk about a result of BFS or DFS, rather than the result, since different implementations might visit vertices in different orders.
Breadth-First Search (BFS)
Breadth-First Search (BFS) is a fundamental graph traversal algorithm that explores the vertices of a graph in layers, starting from a specified source vertex. It progresses by visiting all immediate neighbors of the starting point, then the neighbors of those neighbors, and so on.
To efficiently keep track of the traversal, BFS employs two primary data structures:
- A queue, typically named
unexplored
orqueue
, to store nodes that are pending exploration. - A hash table or a set called
visited
to ensure that we do not revisit nodes.
Algorithm Steps
- Begin from a starting vertex, $i$.
- Mark the vertex $i$ as visited.
- Explore each of its neighbors. If the neighbor hasn't been visited yet, mark it as visited and enqueue it in
unexplored
. - Dequeue the front vertex from
unexplored
and repeat step 3. - Continue this process until the
unexplored
queue becomes empty.
To ensure the algorithm doesn't fall into an infinite loop due to cycles in the graph, it could be useful to mark nodes as visited as soon as they are enqueued. This prevents them from being added to the queue multiple times.
Example
Queue: Empty Visited: A, B, C, D, E
A
/ \
B C
| |
D E
In this example, BFS started at the top of the graph and worked its way down, visiting nodes in order of their distance from the starting node. The ASCII representation provides a step-by-step visualization of BFS using a queue and a list of visited nodes.
Applications
BFS is not only used for simple graph traversal. Its applications span multiple domains:
- BFS can determine the shortest path in an unweighted graph from a source to all other nodes.
- To find all connected components in an undirected graph, you can run BFS on every unvisited node.
- BFS mirrors the propagation in broadcasting networks, where a message is forwarded to neighboring nodes, and they subsequently forward it to their neighbors.
- If during BFS traversal, an already visited node is encountered (and it's not the parent of the current node in traversal), then there exists a cycle in the graph.
Implementation
Depth-First Search (DFS)
Depth-First Search (DFS) is another fundamental graph traversal algorithm, but unlike BFS which traverses level by level, DFS dives deep into the graph, exploring as far as possible along each branch before backtracking.
To implement DFS, we use two main data structures:
- A stack, either implicitly using the call stack through recursion or explicitly using a data structure. This stack is responsible for tracking vertices that are to be explored.
- A hash table or set called
visited
to ensure nodes aren't revisited.
Algorithm Steps
- Begin from a starting vertex, $i$.
- Mark vertex $i$ as visited.
- Visit an unvisited neighbor of $i$, mark it as visited, and move to that vertex.
- Repeat the above step until the current vertex has no unvisited neighbors.
- Backtrack to the previous vertex and explore other unvisited neighbors.
- Continue this process until you've visited all vertices connected to the initial start vertex.
Marking nodes as visited as soon as you encounter them is important to avoid infinite loops, particularly in graphs with cycles.
Example
Stack: Empty Visited: A, B, D, C, E
A
/ \
B C
| |
D E
In this example, DFS explored as deep as possible along the left side (branch with B and D) of the graph before backtracking and moving to the right side (branch with C and E). The ASCII representation provides a step-by-step visualization of DFS using a stack and a list of visited nodes.
Applications
DFS, with its inherent nature of diving deep, has several intriguing applications:
- Topological Sorting is used in scheduling tasks, where one task should be completed before another starts.
- To find all strongly connected components in a directed graph.
- DFS can be employed to find a path between two nodes, though it might not guarantee the shortest path.
- If during DFS traversal, an already visited node is encountered (and it's not the direct parent of the current node in traversal), then there's a cycle in the graph.
Implementation
Shortest paths
A common task when dealing with weighted graphs is to find the shortest route between two vertices, such as from vertex $A$ to vertex $B$. Note that there might not be a unique shortest path, since several paths could have the same length.
Dijkstra's Algorithm
- Dijkstra's algorithm is a method to find the shortest paths from a starting vertex to all other vertices in a weighted graph.
- A weighted graph is one where each edge has a numerical value (cost, distance, or time).
- The algorithm starts at a starting vertex, often labeled A, and computes the shortest path to every other vertex.
- It keeps a tentative distance for each vertex, representing the current known shortest distance from the start.
- It repeatedly selects the vertex with the smallest tentative distance that hasn't been finalized (or "finished") yet.
- Once a vertex is selected, the algorithm relaxes all its edges: it checks if going through this vertex offers a shorter path to its neighbors.
- This continues until all vertices are processed, yielding the shortest paths from the starting vertex to every other vertex.
- Important: Dijkstra’s algorithm requires non-negative edge weights, or else results can be incorrect.
Algorithm Steps
Input
- A weighted graph where each edge has a cost or distance
- A starting vertex
A
Output
- An array
distances
wheredistances[v]
is the shortest distance fromA
to vertexv
Containers and Data Structures
- An array
distances
, initialized to∞
for all vertices exceptA
, which is set to0
- A hash table
finished
to mark vertices with confirmed shortest paths - A priority queue to efficiently select the vertex with the smallest current distance
Steps
I. Initialize distances[A]
to 0
II. Initialize distances[v]
to ∞
for every other vertex v
III. While not all vertices are marked as finished
- Select vertex
u
with the smallestdistances[u]
among unfinished vertices - Mark
finished[u]
astrue
- For each neighbor
w
ofu
, ifdistances[u] + weights[u][w]
is less thandistances[w]
, updatedistances[w]
todistances[u] + weights[u][w]
Step by Step Example
Consider a graph with vertices A, B, C, D, and E, and edges:
A-B: 4
A-C: 2
C-B: 1
B-D: 5
C-D: 8
C-E: 10
D-E: 2
The adjacency matrix looks like this (∞ means no direct edge):
A | B | C | D | E | |
A | 0 | 4 | 2 | ∞ | ∞ |
B | 4 | 0 | 1 | 5 | ∞ |
C | 2 | 1 | 0 | 8 | 10 |
D | ∞ | 5 | 8 | 0 | 2 |
E | ∞ | ∞ | 10 | 2 | 0 |
Starting from A, here’s how Dijkstra’s algorithm proceeds:
I. Initialize all distances with ∞ except A=0:
A: 0
B: ∞
C: ∞
D: ∞
E: ∞
II. From A (distance 0), update neighbors:
A: 0
B: 4 (via A)
C: 2 (via A)
D: ∞
E: ∞
III. Pick the smallest unvisited vertex (C with distance 2). Update its neighbors:
- B can be updated to 3 if 2 + 1 < 4
- D can be updated to 10 if 2 + 8 < ∞
- E can be updated to 12 if 2 + 10 < ∞
A: 0
B: 3 (via C)
C: 2
D: 10 (via C)
E: 12 (via C)
IV. Pick the next smallest unvisited vertex (B with distance 3). Update its neighbors:
- D becomes 8 if 3 + 5 < 10
- E remains 12 (no direct edge from B to E)
A: 0
B: 3
C: 2
D: 8 (via B)
E: 12
V. Pick the next smallest unvisited vertex (D with distance 8). Update its neighbors:
- E becomes 10 if 8 + 2 < 12
A: 0
B: 3
C: 2
D: 8
E: 10 (via D)
VI. The only remaining vertex is E (distance 10). No further updates are possible.
Final shortest paths from A:
A: 0
B: 3
C: 2
D: 8
E: 10
Optimizing Time Complexity
- A basic (array-based) implementation of Dijkstra's algorithm runs in O(n^2) time.
- Using a priority queue (min-heap) to select the vertex with the smallest distance reduces the complexity to O((V+E) log V), where V is the number of vertices and E is the number of edges.
Applications
- Internet routing protocols use it to determine efficient paths for data packets.
- Mapping software (e.g., Google Maps, Waze) employ variations of Dijkstra to compute driving routes.
- Telecommunication networks use it to determine paths with minimal cost.
Implementation
Bellman-Ford Algorithm
- Bellman-Ford algorithm is a method for finding the shortest paths from a single starting vertex to all other vertices in a weighted graph.
- Unlike Dijkstra’s algorithm, Bellman-Ford can handle negative edge weights, making it more flexible for certain types of graphs.
- The algorithm works by repeatedly relaxing all edges in the graph. Relaxing an edge means updating the current shortest distance to a vertex if a shorter path is found via another vertex.
- The algorithm performs this relaxation process exactly $V - 1$ times, where $V$ is the number of vertices. This ensures that every possible shortest path is discovered.
- After completing $V - 1$ relaxations, the algorithm does one more pass to detect negative weight cycles. If any edge can still be relaxed, a negative cycle exists and no finite shortest path is defined.
- Bellman-Ford’s time complexity is $O(V \times E)$, which is generally slower than Dijkstra’s algorithm for large graphs.
Algorithm Steps
Input
- A weighted graph with possible negative edge weights
- A starting vertex
A
Output
- An array
distances
wheredistances[v]
represents the shortest path fromA
to vertexv
Containers and Data Structures
- An array
distances
, set to∞
for all vertices except the start vertex (set to0
) - A
predecessor
array to help reconstruct the actual shortest path
Steps
I. Initialize distances[A]
to 0
and distances[v]
to ∞
for all other vertices v
II. Repeat V - 1
times
- For every edge
(u, v)
with weightw
, ifdistances[u] + w < distances[v]
, updatedistances[v]
todistances[u] + w
andpredecessor[v]
tou
III. Check for negative cycles by iterating over all edges (u, v)
again
- If
distances[u] + w < distances[v]
for any edge, a negative weight cycle exists
Step by Step Example
We have vertices A, B, C, D, and E. The edges and weights (including a self-loop on E):
A-B: 6
A-C: 7
B-C: 8
B-D: -4
B-E: 5
C-E: -3
D-A: 2
D-C: 7
E-E: 9
Adjacency matrix (∞ means no direct edge):
A | B | C | D | E | |
A | 0 | 6 | 7 | ∞ | ∞ |
B | ∞ | 0 | 8 | -4 | 5 |
C | ∞ | ∞ | 0 | ∞ | -3 |
D | 2 | ∞ | 7 | 0 | ∞ |
E | ∞ | ∞ | ∞ | ∞ | 9 |
Initialization:
dist[A] = 0
dist[B] = ∞
dist[C] = ∞
dist[D] = ∞
dist[E] = ∞
Iteration 1 (relax edges from A):
dist[B] = 6
dist[C] = 7
Iteration 2 (relax edges from B, then C):
dist[D] = 2 (6 + (-4))
dist[E] = 11 (6 + 5)
dist[E] = 4 (7 + (-3)) // C → E is better
Iteration 3 (relax edges from D):
dist[A] = 4 (2 + 2)
(No update for C since dist[C]=7 is already < 9)
Iteration 4:
No changes in this round
Final distances from A:
dist[A] = 0
dist[B] = 6
dist[C] = 7
dist[D] = 2
dist[E] = 4
Special Characteristics
- It can manage negative edge weights but cannot produce valid results when negative cycles are present.
- It is often used when edges can be negative, though it is slower than Dijkstra’s algorithm.
Applications
- Financial arbitrage detection in currency exchange markets.
- Routing in networks where edges might have negative costs.
- Game development scenarios with penalties or negative terrain effects.
Implementation
A* (A-Star) Algorithm
- A* is an informed search algorithm used for pathfinding and graph traversal.
- It is a best-first search because it prioritizes the most promising paths first, combining known and estimated costs.
- The algorithm relies on:
- g(n): The actual cost from the start node to the current node n.
- h(n): A heuristic estimating the cost from n to the goal.
- The total cost function is f(n) = g(n) + h(n), guiding the search toward a potentially optimal path.
- At each step, A expands the node with the lowest f(n)* in the priority queue.
- The heuristic h(n) must be admissible (never overestimates the real cost) to guarantee an optimal result.
- A terminates when it either reaches the goal* or exhausts all possibilities if no solution exists.
- It is efficient for many applications because it balances exploration with being goal-directed, but its performance depends on the heuristic quality.
- A is broadly used in games, robotics, and navigation* due to its effectiveness in real-world pathfinding.
Algorithm Steps
Input
- A graph
- A start vertex
A
- A goal vertex
B
- A heuristic function
h(v)
that estimates the cost fromv
toB
Output
- The shortest path from
A
toB
if one exists
Used Data Structures
I. g(n): The best-known cost from the start vertex to vertex n
II. h(n): The heuristic estimate from vertex n
to the goal
III. f(n) = g(n) + h(n): The estimated total cost from start to goal via n
IV. openSet: Starting with the initial node, contains nodes to be evaluated
V. closedSet: Contains nodes already fully evaluated
VI. cameFrom: Structure to record the path taken
Steps
I. Add the starting node to the openSet
II. While the openSet is not empty
- Get the node
current
in openSet with the lowest f(n) - If
current
is the goal node, reconstruct the path and return it - Remove
current
from openSet and add it to closedSet - For each neighbor
n
ofcurrent
, skip it if it is in closedSet - If
n
is not in openSet, add it and compute g(n), h(n), and f(n) - If a better path to
n
is found, update cameFrom forn
III. If the algorithm terminates without finding the goal, no path exists
Step by Step Example
We have a graph with vertices A, B, C, D, and E:
A-B: 1
A-C: 2
B-D: 3
C-D: 2
D-E: 1
Heuristic estimates to reach E:
h(A) = 3
h(B) = 2
h(C) = 2
h(D) = 1
h(E) = 0
Adjacency matrix (∞ = no direct path):
A | B | C | D | E | |
A | 0 | 1 | 2 | ∞ | ∞ |
B | ∞ | 0 | ∞ | 3 | ∞ |
C | ∞ | ∞ | 0 | 2 | ∞ |
D | ∞ | ∞ | ∞ | 0 | 1 |
E | ∞ | ∞ | ∞ | ∞ | 0 |
Initialization:
g(A) = 0
f(A) = g(A) + h(A) = 0 + 3 = 3
openSet = [A]
closedSet = []
Expand A:
f(B) = 0 + 1 + 2 = 3
f(C) = 0 + 2 + 2 = 4
Expand B next (lowest f=3):
f(D) = g(B) + cost(B,D) + h(D) = 1 + 3 + 1 = 5
Next lowest is C (f=4):
f(D) = g(C) + cost(C,D) + h(D) = 2 + 2 + 1 = 5 (no improvement)
Expand D (f=5):
f(E) = g(D) + cost(D,E) + h(E) = 5 + 1 + 0 = 6
E is the goal; algorithm stops.
Resulting path: A -> B -> D -> E with total cost 5.
Special Characteristics
- A* finds an optimal path if the heuristic is admissible.
- Edges must have non-negative weights for A* to work correctly.
- A good heuristic drastically improves its efficiency.
Applications
- Used in video games for enemy AI or player navigation.
- Employed in robotics for motion planning.
- Integral to mapping and GPS systems for shortest route calculations.
Implementation
Minimal Spanning Trees
Suppose we have a graph that represents a network of houses. Weights represent the distances between vertices, which each represent a single house. All houses must have water, electricity, and internet, but we want the cost of installation to be as low as possible. We need to identify a subgraph of our graph with the following properties:
- There are no cycles in the graph.
- All vertices are connected.
- The total sum of weights is minimum.
Such a subgraph is called a minimal spanning tree.
Prim's Algorithm
- Prim's Algorithm is used to find a minimum spanning tree (MST), which is a subset of a graph that connects all its vertices with the smallest total edge weight.
- It works on a weighted undirected graph, meaning the edges have weights, and the direction of edges doesn’t matter.
- It starts with an arbitrary vertex and grows the MST by adding one edge at a time.
- At each step, it chooses the smallest weight edge that connects a vertex in the MST to a vertex not yet in the MST (a greedy approach).
- This process continues until all vertices are included.
- The resulting MST is connected, ensuring a path between any two vertices, and the total edge weight is minimized.
- Using a priority queue (min-heap), it can achieve a time complexity of O(E log V) with adjacency lists, where E is the number of edges and V is the number of vertices.
- With an adjacency matrix, the algorithm can be implemented in O(V^2) time.
Algorithm Steps
Input
- A connected, undirected graph with weighted edges
- A start vertex
A
Output
- A minimum spanning tree, which is a subset of the edges that connects all vertices together without any cycles and with the minimum total edge weight
Containers and Data Structures
- An array
key[]
to store the minimum reachable edge weight for each vertex. Initially,key[v] = ∞
for allv
except the first chosen vertex (set to0
) - A boolean array
mstSet[]
to keep track of whether a vertex is included in the MST. Initially, all values arefalse
- An array
parent[]
to store the MST. Eachparent[v]
indicates the vertex connected tov
in the MST
Steps
I. Start with an arbitrary node as the initial MST node
II. While there are vertices not yet included in the MST
- Pick a vertex
v
with the smallestkey[v]
- Include
v
inmstSet[]
- For each neighboring vertex
u
ofv
not in the MST - If the weight of edge
(u, v)
is less thankey[u]
, updatekey[u]
and setparent[u]
tov
III. The MST is formed using the parent[]
array once all vertices are included
Step by Step Example
Consider a simple graph with vertices A, B, C, D, and E. The edges with weights are:
A-B: 2
A-C: 3
B-D: 1
B-E: 3
C-D: 4
C-E: 5
D-E: 2
The adjacency matrix for the graph (using ∞ where no direct edge exists) is:
A | B | C | D | E | |
A | 0 | 2 | 3 | ∞ | ∞ |
B | 2 | 0 | ∞ | 1 | 3 |
C | 3 | ∞ | 0 | 4 | 5 |
D | ∞ | 1 | 4 | 0 | 2 |
E | ∞ | 3 | 5 | 2 | 0 |
Run Prim's algorithm starting from vertex A:
I. Initialization
Chosen vertex: A
Not in MST: B, C, D, E
II. Pick the smallest edge from A
Closest vertex is B with a weight of 2.
MST now has: A, B
Not in MST: C, D, E
III. From A and B, pick the smallest edge
Closest vertex is D (from B) with a weight of 1.
MST now has: A, B, D
Not in MST: C, E
IV. Next smallest edge from A, B, or D
Closest vertex is E (from D) with a weight of 2.
MST now has: A, B, D, E
Not in MST: C
V. Pick the final vertex
The closest remaining vertex is C (from A) with a weight of 3.
MST now has: A, B, D, E, C
The MST includes the edges: A-B (2), B-D (1), D-E (2), and A-C (3), with a total weight of 8.
Special Characteristics
- It always selects the smallest edge that can connect a new vertex to the existing MST.
- Different choices of starting vertex can still result in the same total MST weight (though the exact edges might differ if multiple edges have the same weight).
- With adjacency lists and a priority queue, the time complexity is O(E log V); with an adjacency matrix, it is O(V^2).
Applications
- Network design: Building telecommunication networks with minimal cable length.
- Road infrastructure: Constructing roads, tunnels, or bridges at minimal total cost.
- Utility services: Designing water, electrical, or internet infrastructure to connect all locations at minimum cost.
Implementation
Kruskal's Algorithm
- Kruskal's Algorithm is used to find a minimum spanning tree (MST) in a connected, undirected graph with weighted edges.
- It sorts all edges from smallest to largest by weight.
- It adds edges one by one to the MST if they do not form a cycle.
- Cycle detection is managed by a disjoint-set (union-find) data structure, which helps quickly determine if two vertices belong to the same connected component.
- If adding an edge connects two different components, it is safe to include; if both vertices are already in the same component, including that edge would create a cycle and is skipped.
- The process continues until the MST has V-1 edges, where V is the number of vertices.
- Its time complexity is O(E \log E), dominated by sorting the edges, while union-find operations typically take near-constant time (O(α(V)), where α is the inverse Ackermann function).
Algorithm Steps
Input
- A connected, undirected graph with weighted edges
Output
- A subset of edges forming a MST, ensuring all vertices are connected with no cycles and minimal total weight
Containers and Data Structures
- A list or priority queue to sort the edges by weight
- A
disjoint-set (union-find)
structure to manage and merge connected components
Steps
I. Sort all edges in increasing order of their weights
II. Initialize a forest where each vertex is its own tree
III. Iterate through the sorted edges
- If the edge
(u, v)
connects two different components, include it in the MST and perform aunion
of the sets - If it connects vertices in the same component, skip it
IV. Once V-1
edges have been added, the MST is complete
Step by Step Example
Consider a graph with vertices A, B, C, D, and E. The weighted edges are:
A-B: 2
A-C: 3
B-D: 1
B-E: 3
C-D: 4
C-E: 5
D-E: 2
The adjacency matrix (∞ indicates no direct edge):
A | B | C | D | E | |
A | 0 | 2 | 3 | ∞ | ∞ |
B | 2 | 0 | ∞ | 1 | 3 |
C | 3 | ∞ | 0 | 4 | 5 |
D | ∞ | 1 | 4 | 0 | 2 |
E | ∞ | 3 | 5 | 2 | 0 |
Sort edges by weight:
B-D: 1
A-B: 2
D-E: 2
A-C: 3
B-E: 3
C-D: 4
C-E: 5
- Pick B-D (1): Include it. MST has {B-D}, weight = 1.
- Pick A-B (2): Include it. MST has {B-D, A-B}, weight = 3.
- Pick D-E (2): Include it. MST has {B-D, A-B, D-E}, weight = 5.
- Pick A-C (3): Include it. MST has {B-D, A-B, D-E, A-C}, weight = 8.
- Pick B-E (3): Would form a cycle (B, D, E already connected), skip.
- Pick C-D (4): Would form a cycle (C, D already connected), skip.
- Pick C-E (5): Would form a cycle as well, skip.
The MST edges are B-D, A-B, D-E, and A-C, total weight = 8.
Special Characteristics
- It always picks the smallest available edge that won't create a cycle.
- In case of a tie, any equally weighted edge can be chosen.
- The approach is particularly efficient for sparse graphs.
- Sorting edges takes O(E \log E) time, and disjoint-set operations can be considered almost O(1) on average.
Applications
- Network design: Connecting servers or cities using minimal cable length.
- Infrastructure: Building road systems, water lines, or power grids with the smallest total cost.
- Any MST requirement: Ensuring connectivity among all nodes at minimum cost.