Last modified: August 31, 2025
This article is written in: πΊπΈ
In computer science, a collection (often interchangeably referred to as a container) is a sophisticated data structure designed to hold multiple entities, these could be simple elements like numbers or text strings, or more complex objects like user-defined structures. Collections help you store, organize, and manipulate different types of data in your programs.
Every developer should really get to know collections well. It helps you pick the right type of collection for the job, making your code faster, more efficient, and easier to manage in the long run.
A Linked List is a basic way to organize data where you have a chain of nodes. Each node contains two distinct parts:
Importantly, the last node in the linked list points to a null value, effectively signifying the end of the list.
Linked lists can be highly valuable when the data doesn't require contiguous storage in memory or when the exact size of the data is undetermined at the time of memory allocation. While they can serve as an alternative to arrays, it's essential to note that linked lists generally have slower access times and are more challenging to manipulate due to their non-contiguous storage and absence of direct access to individual elements.
[HEAD]-->[1| ]-->[2| ]-->[3| ]-->[4| ]-->[NULL]
Here are some standard operations associated with linked lists:
is_empty()
: This checks if the list is devoid of any elements.front()
: This retrieves the first element in the list.append(element)
: This adds a new element to the end of the list.remove(index)
: This removes an element at a specified index.replace(index, data)
: This substitutes the element at a specified index with a new value.
Operation | Average case | Worst case |
Access | O(n) |
O(n) |
Insert | O(1) |
O(1) |
Delete | O(1) |
O(1) |
Search | O(n) |
O(n) |
Note that while insertion and deletion are constant time operations, access and search operations require traversing the list, resulting in linear time complexity.
Linked lists find utility in a wide variety of areas including, but not limited to:
Linked lists can be implemented in various ways, largely dependent on the programming language used. A conventional implementation strategy involves defining a custom class or structure to represent a node. This class encapsulates a value and a reference to the next node. Additionally, methods for adding, removing, and accessing elements are provided to manipulate the list.
In some languages, linked lists are provided as part of the standard library, abstracting away the underlying implementation details and providing a rich set of functionality.
In computer science, a vector can be thought of as a dynamic array with the ability to adjust its size as required. While similar to traditional arrays, vectors distinguish themselves with their superior flexibility and efficiency in numerous scenarios. They store elements in contiguous blocks of memory, ensuring swift access to elements through index-based referencing.
Vectors are often favored over low-level arrays due to their augmented feature set and integrated safety mechanisms. These characteristics contribute to their easier usability and overall reliability.
-------------------------
| 0 | 1 | 2 | 3 | 4 | 5 |
-------------------------
Some of the commonly used operations for vectors are:
sort(start, end)
: This function sorts the elements in the specified range within the vector.reverse(start, end)
: This operation reverses the order of elements between specified positions.size()
: This function retrieves the current number of elements in the vector.capacity()
: This retrieves the maximum number of elements the vector can hold before needing to resize.resize(new_size)
: This operation alters the vector size, adding or removing elements as necessary.reserve(new_capacity)
: This increases the vector capacity, if required, without changing the vector size.
Operation | Average case | Worst case |
Access | O(1) |
O(1) |
Insert | O(1) |
O(n) |
Delete | O(1) |
O(n) |
Search | O(n) |
O(n) |
While access to elements is a constant time operation, insertions and deletions can vary depending on where these operations are performed. If at the end, they are constant time operations; however, elsewhere, these operations can result in shifting of elements, causing linear time complexity.
Vectors can be leveraged for a wide array of tasks, such as:
Vectors are typically implemented using an array of elements accompanied by additional metadata such as the current size and total capacity. When a resize operation is necessary, a new, larger array is created, and the existing elements are copied over to the new array. As resizing can be time and resource-consuming, vectors often allocate extra memory ahead of time to minimize the frequency of resizing.
In some programming languages, vectors might internally use other data structures like linked lists. However, regardless of the underlying implementation, the interface presented to the programmer and the time complexity of the operations remain similar.
A stack is a fundamental data structure that models a First-In-Last-Out (FILO) or Last-In-First-Out (LIFO) strategy. In essence, the most recently added element (the top of the stack) is the first one to be removed. This data structure is widely utilized across various programming languages to execute a range of operations, such as arithmetic computations, character printing, and more.
[5] <- top
[4]
[3]
[2]
[1] <- bottom
Typical operations associated with a stack include:
push(element)
: This method places a new element on top of the stack, effectively becoming the most recent addition.pop()
: This method returns the top element of the stack and simultaneously removes it.top()
: This method simply returns the top element without removing it from the stack, allowing you to inspect the most recent addition.is_empty()
: This method checks if the stack is empty, a useful operation before attempting to pop()
or top()
to prevent errors.
Operation | Average case | Worst case |
Access | O(n) |
O(n) |
Insert (Push) | O(1) |
O(1) |
Delete (Pop) | O(1) |
O(1) |
Search | O(n) |
O(n) |
As can be seen, while push and pop operations are constant time, access (retrieving an element without popping) and search operations require linear time since, in the worst case, all elements must be inspected.
Stacks are incredibly versatile and are used in a multitude of applications, including:
Stacks can be implemented using various underlying data structures, but arrays and linked lists are among the most common due to their inherent characteristics which nicely complement the operations and needs of a stack.
The choice between arrays and linked lists often depends on the specific requirements of the application, such as memory usage and the cost of operations. With an array, resizing can be expensive but access is faster, while with a linked list, insertion and deletion are faster but more memory is used due to the extra storage of pointers.
A queue is a fundamental data structure that stores elements in a sequence, with modifications made by adding elements to the end (enqueuing) and removing them from the front (dequeuing). The queue follows a First In First Out (FIFO) strategy, implying that the oldest elements are processed first.
[1] <- front
[2]
[3]
[4]
[5] <- rear
The standard operations associated with a queue are:
enqueue(element)
: This method adds a new element to the end of the queue.dequeue()
: This method returns the front element of the queue and simultaneously removes it.front()
: This method simply returns the front element without removing it from the queue, giving a peek at the next item to be dequeued.is_empty()
: This method checks if the queue is empty, an important step before calling dequeue()
or front()
to prevent errors.While stacks and queues are similar data structures, they differ in several aspects:
Operation | Average case | Worst case |
Access | O(n) |
O(n) |
Insert (Enqueue) | O(1) |
O(1) |
Delete (Dequeue) | O(1) |
O(1) |
Search | O(n) |
O(n) |
Enqueue and dequeue operations are constant time, while accessing or searching for a specific element require linear time as all elements may need to be inspected.
Queues are incredibly versatile and find use in a variety of real-world scenarios, including:
Queues can be implemented using various underlying data structures, including arrays, linked lists, circular buffers, and dynamic arrays. The choice of implementation depends on the specific requirements of the application, such as memory usage, the cost of operations, and whether the size of the queue changes frequently.
A heap is a specialized tree-based data structure that satisfies the heap property. There are two main types of heaps: max-heaps and min-heaps.
Heaps are always complete binary trees, which means they are filled at all levels, except the last, which is filled from left to right. This structure ensures that the heap remains balanced.
[100]
βββ [19]
β βββ [17]
β βββ [3]
βββ [36]
βββ [25]
βββ [1]
The basic operations of a heap are as follows:
insert(value)
: Adds a new element to the heap while ensuring that the heap property is preserved.search(value)
: Searches for a specific value in the heap. Note that heaps do not allow efficient arbitrary searches.remove(value)
: Deletes a specific value from the heap, then restructures the heap to maintain its properties.
Operation | Average case |
Find min/max | O(1) |
Delete min/max | O(log n) |
Insert | O(log n) |
Merge | O(m log(m+n)) |
These time complexities make heaps especially useful in situations where we need to repeatedly remove the minimum (or maximum) element.
Heaps are versatile and find widespread use across various computational problems:
Heaps can be represented efficiently using a one-dimensional array, which allows for easy calculation of the parent, left, and right child positions. The standard mapping from the heap to this array representation is as follows:
1
of the array representation.i
is positioned at index 2*i
in the array.i
can be found at index 2*i + 1
.i
, calculate the index as floor(i/2)
.This layout keeps the parent-child relationships consistent and makes navigation within the heap straightforward and efficient.
A Binary Search Tree (BST) is a type of binary tree where each node has up to two children and maintains a specific ordering property among its values. For every node, the values of all nodes in its left subtree are less than its own value, and the values of all nodes in its right subtree are greater than its own value. In a well-balanced BST, this property enables efficient lookup, addition, and deletion operations, ideally distributing the nodes evenly across the left and right subtrees. Note that BSTs typically disallow duplicate values.
#
[1]
/ \
[2] [3]
/ \ \
[4] [5] [6]
The fundamental operations provided by a BST are:
insert(value)
: Inserts a new value into the tree, maintaining the BST property.search(value)
: Searches for a specific value in the tree. If the value exists, the operation returns the node. If not, it typically returns null
.remove(value)
: Deletes a value from the tree, preserving the BST property. If the node to be removed has two children, it's replaced with its in-order predecessor or successor.minimum()
: Retrieves the smallest value in the tree, which is the leftmost node.maximum()
: Retrieves the largest value in the tree, which is the rightmost node.The efficiency of BST operations depends on the height of the tree. In a balanced tree, this results in an average-case time complexity of O(log n)
. However, in the worst-case scenario (a degenerate or unbalanced tree), the time complexity degrades to O(n)
.
Operation | Average case | Worst case |
Access | O(log n) |
O(n) |
Insert | O(log n) |
O(n) |
Delete | O(log n) |
O(n) |
Search | O(log n) |
O(n) |
A BST is typically implemented using a node-based model where each node encapsulates a value and pointers to the left and right child nodes. A special root pointer points to the first node, if any.
Insertion, deletion, and search operations are often implemented recursively to traverse the tree starting from the root, making comparisons at each node to guide the direction of traversal (left or right). This design pattern exploits the recursive nature of the tree structure and results in clean, comprehensible code.
Named after its inventors, G.M. Adelson-Velsky and E.M. Landis, AVL trees are a subtype of binary search trees (BSTs) that self-balance. Unlike regular BSTs, AVL trees maintain a stringent balance by ensuring the height difference between the left and right subtrees of any node is at most one. This tight control on balance guarantees optimal performance for lookups, insertions, and deletions, making AVL trees highly efficient.
#
[20]
/ \
[10] [30]
/ \ / \
[5] [15] [25] [35]
The defining characteristics of AVL trees include:
AVL trees support standard BST operations with some modifications to maintain balance:
insert(value)
: Add a value to the tree while preserving the AVL properties.search(value)
: Find and return a node with the given value in the tree.remove(value)
: Delete a node with the given value while keeping the tree balanced.rotate_left()
, rotate_right()
: Perform rotation operations to balance the tree after insertions or deletions.
Operation | Average | Worst |
Access | O(log n) |
O(log n) |
Insert | O(log n) |
O(log n) |
Delete | O(log n) |
O(log n) |
Search | O(log n) |
O(log n) |
The AVL tree's self-balancing property lends itself to various applications, such as:
The secret to maintaining AVL trees' balance is the application of rotation operations. Four types of rotations can occur based on the tree's balance state:
These rotations keep AVL trees balanced, ensuring consistent performance.
Red-Black trees are a type of self-balancing binary search trees, with each node bearing an additional attribute: color. Each node in a Red-Black tree is colored either red or black, and these color attributes follow specific rules to maintain the tree's balance. This balance ensures that fundamental tree operations such as insertions, deletions, and searches remain efficient.
#
(30B)
/ \
(20R) (40B)
/ \ / \
(10B) (25B) (35B) (50B)
Red-Black trees abide by the following essential properties:
Red-Black trees support typical BST operations, albeit with additional steps to manage node colors and balance:
insert(value)
: Add a value to the tree, managing colors and rotations to maintain balance.search(value)
: Find and return a node with the given value in the tree.remove(value)
: Delete a node with the given value, adjusting colors and performing rotations as necessary to keep the tree balanced.rotate_left()
, rotate_right()
: Perform rotation operations to maintain balance after insertions or deletions.
Operation | Average | Worst |
Access | O(log n) |
O(log n) |
Insert | O(log n) |
O(log n) |
Delete | O(log n) |
O(log n) |
Search | O(log n) |
O(log n) |
Red-Black trees find application in several areas, including:
Red-Black trees are a sophisticated binary search tree variant. They require careful management of node colors during insertions and deletions, often involving multiple rebalancing steps:
These careful rotations and recoloring steps enable Red-Black trees to maintain balance and ensure efficient performance across various operations.
Hash tables, often referred to as hash maps or dictionaries, are a type of data structure that employs a hash function to pair keys with their corresponding values. This mechanism enables efficient insertion, deletion, and lookup of items within a collection, making hash tables a fundamental tool in many computing scenarios. They are often used in contexts where fast lookup and insertion operations are important, such as database management and networking applications.
+-----------------+
| Key | Value |
|-----|-----------|
| K1 | V1 |
| K2 | V2 |
| K3 | V3 |
| K4 | V4 |
| ... | ... |
+-----------------+
Hash tables support a range of operations, which include:
is_empty()
: This method checks if the hash table is empty and returns True
if so, or False
otherwise.insert(key, value)
: This method adds a new key-value pair to the hash table.retrieve(key)
: This function fetches the value associated with a specified key.update(key, value)
: This operation modifies the value associated with a given key.delete(key)
: This function removes the key-value pair corresponding to the provided key from the hash table.traverse()
: This method allows iteration over all the key-value pairs in the hash table.The performance of hash table operations depends on the hash function's quality, the strategy for resolving collisions, and the load factor (the ratio of the number of key-value pairs to the number of slots or buckets in the table).
Operation | Average case | Worst case |
Access | - |
- |
Insert | O(1) |
O(n) |
Delete | O(1) |
O(n) |
Search | O(1) |
O(n) |
Hash tables are ubiquitous across various domains, including:
The creation of a hash table involves setting up a data structure to store key-value pairs, designing a hash function to map keys to slots in the data structure, and devising a collision resolution strategy to handle instances where multiple keys hash to the same slot.
Common data structures used include arrays (for open addressing) and linked lists or balanced trees (for separate chaining).
A hash function takes a key (like a string) and turns it into a number, which then decides where to store the data in a table. The goal is to spread out the keys evenly so that no single spot gets overloaded. One simple method for strings is to add up the ASCII values of the characters and then use the remainder from dividing by the table size to pick a slot. Using prime numbers for the table size or in the calculations can help achieve a more balanced spread.
# n: number of keys
# m: number of buckets
h(x) = n % m
Collisions occur when two or more keys hash to the same slot. There are two common strategies to handle these collisions: chaining and open addressing.
With chaining, each slot or bucket in the hash table acts as a linked list. Each new key-value pair is appended to its designated list. While this approach allows for efficient insertions and deletions, the worst-case time complexity for lookups escalates to O(n)
, where n
is the number of keys stored.
The following is an example:
index Bucket
---------------------
0 -> ["apple"]
1 -> ["banana", "mango"]
2 -> ["cherry"]
3 -> []
4 -> ["dates", "elderberry", "fig"]
...
In the above example, "banana" and "mango" hash to the same index (1), so they are stored in the same linked list.
In open addressing, the hash table is an array, and each key-value pair is stored directly in an array slot. When a collision arises, the hash table searches for the next available slot according to a predefined probe sequence. Common probing techniques include linear probing, where the table checks each slot one by one, and quadratic probing, which checks slots based on a quadratic function of the distance from the original hash.
Here's an example using linear probing:
index Bucket
---------------------
0 -> "apple"
1 -> "cherry"
2 -> "banana" (originally hashed to 1, moved due to collision)
3 -> "mango" (originally hashed to 1, moved due to collision)
4 -> "dates"
5 -> "elderberry"
6 -> "fig"
...
In this case, "banana" and "mango" were hashed to index 1, but since it was already occupied by "cherry", they were moved to the next available slots (2 and 3, respectively).