Last modified: September 10, 2022
This article is written in: 🇺🇸
Joining and Splitting Arrays
In NumPy, manipulating the structure of arrays is a common operation. Whether combining multiple arrays into one or splitting a single array into several parts, NumPy provides a set of intuitive functions to achieve these tasks efficiently. Understanding how to join and split arrays is essential for organizing data, preparing it for analysis, and optimizing computational performance. This guide covers various methods to join and split arrays, offering detailed explanations and practical examples to help you utilize these tools effectively.
Stacking Arrays
Stacking is the technique of joining arrays along a new axis, effectively increasing the dimensionality of the resulting array. NumPy's np.stack()
function allows you to specify the axis along which the arrays will be stacked, providing flexibility in how the data is combined.
Example: Stacking Along a New Axis
import numpy as np
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
# Vertical stacking (default axis=0)
c = np.stack((a, b))
print("Vertically stacked:\n", c)
# Horizontal stacking (axis=1)
d = np.stack((a, b), axis=1)
print("\nHorizontally stacked:\n", d)
Expected output:
Vertically stacked:
[[[1 2]
[3 4]]
[[5 6]
[7 8]]]
Horizontally stacked:
[[[1 2]
[5 6]]
[[3 4]
[7 8]]]
Vertical Stacking (axis=0
):
np.stack((a, b))
stacks arraysa
andb
along a new first axis (axis=0).- The resulting array
c
has a shape of(2, 2, 2)
, indicating two stacked 2x2 matrices. - Practical Use Case: Vertical stacking is useful when you need to combine datasets that have the same number of columns but represent different observations or samples.
Horizontal Stacking (axis=1
):
np.stack((a, b), axis=1)
stacks arraysa
andb
along a new second axis (axis=1).- The resulting array
d
also has a shape of(2, 2, 2)
, but the stacking orientation differs, effectively pairing corresponding rows from each array. - Practical Use Case: Horizontal stacking is beneficial when combining features from different datasets that share the same number of rows, allowing for the integration of multiple feature sets side by side.
Concatenating Arrays
Concatenation merges arrays along an existing axis without introducing a new one. The np.concatenate()
function is versatile, allowing you to specify the axis along which the arrays should be joined.
Example: Concatenation Along Existing Axes
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
# Vertical concatenation (default axis=0)
c = np.concatenate((a, b))
print("Vertically concatenated:\n", c)
# Horizontal concatenation (axis=1)
d = np.concatenate((a, b), axis=1)
print("\nHorizontally concatenated:\n", d)
Expected output:
Vertically concatenated:
[[1 2]
[3 4]
[5 6]
[7 8]]
Horizontally concatenated:
[[1 2 5 6]
[3 4 7 8]]
Vertical Concatenation (axis=0
):
np.concatenate((a, b))
joins arraysa
andb
along the first axis (rows).- The resulting array
c
has a shape of(4, 2)
, effectively stackingb
belowa
. - Practical Use Case: Vertical concatenation is ideal for combining datasets with the same number of columns but different rows, such as appending new data samples to an existing dataset.
Horizontal Concatenation (axis=1
):
np.concatenate((a, b), axis=1)
joins arraysa
andb
along the second axis (columns).- The resulting array
d
has a shape of(2, 4)
, placingb
to the right ofa
. - Practical Use Case: Horizontal concatenation is useful when merging feature sets from different sources that have the same number of observations, enabling the combination of multiple attributes into a single dataset.
Appending to Arrays
Appending involves adding elements or arrays to the end of an existing array. The np.append()
function is straightforward and allows for both simple and complex append operations.
Example: Appending Values
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
# Append b to a
c = np.append(a, b)
print("Appended array:\n", c)
Expected output:
Appended array:
[1 2 3 4 5 6]
Explanation:
np.append(a, b)
concatenates arrayb
to the end of arraya
, resulting in a new arrayc
that combines all elements from both arrays.- Practical Use Case: Appending is useful when you need to dynamically add new data points to an existing array, such as adding new user inputs or streaming data to a dataset.
Additional Considerations:
By default, np.append()
flattens the input arrays if the axis is not specified. To append along a specific axis, you must ensure that the arrays have compatible shapes.
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6]])
# Append along axis=0 (rows)
c = np.append(a, b, axis=0)
print("Appended along axis 0:\n", c)
# Append along axis=1 (columns)
d = np.append(a, [[5], [6]], axis=1)
print("\nAppended along axis 1:\n", d)
Expected output:
Appended along axis 0:
[[1 2]
[3 4]
[5 6]]
Appended along axis 1:
[[1 2 5]
[3 4 6]]
When specifying an axis, ensure that the dimensions other than the specified axis match between the arrays being appended.
Splitting Arrays
Splitting breaks down an array into smaller subarrays. This operation is useful for dividing data into manageable chunks, preparing batches for machine learning models, or separating data into distinct groups for analysis. NumPy's np.split()
function is commonly used for this purpose.
Regular and Custom Splits
Regular splits divide an array into equal parts, while custom splits allow you to specify the exact indices where the array should be divided.
a = np.array([1, 2, 3, 4, 5, 6])
# Split into three equal parts
b = np.split(a, 3)
print("Regular split:\n", b)
# Split at the 2nd and 4th indices
c = np.split(a, [2, 4])
print("\nCustom split:\n", c)
Expected output:
Regular split:
[array([1, 2]), array([3, 4]), array([5, 6])]
Custom split:
[array([1, 2]), array([3, 4]), array([5, 6])]
Regular Split (np.split(a, 3)
):
- Divides array
a
into three equal parts. - Each resulting subarray has two elements.
- Practical Use Case: Regular splitting is useful when you need to divide data into uniform batches for parallel processing or batch training in machine learning.
Custom Split (np.split(a, [2, 4])
):
- Splits array
a
at indices 2 and 4, resulting in three subarrays: the first containing elements up to index 2, the second from index 2 to 4, and the third from index 4 onwards. - Practical Use Case: Custom splitting allows for flexible division of data based on specific criteria or requirements, such as separating features from labels in a dataset or dividing a dataset into training and testing sets.
Additional Considerations:
When performing a regular split, the array must be divisible into the specified number of sections. If it is not, NumPy will raise a ValueError
.
a = np.array([1, 2, 3, 4, 5])
try:
b = np.split(a, 3)
except ValueError as e:
print("Error:", e)
Expected output:
Error: array split does not result in an equal division
Depending on the specific needs, other splitting functions like np.hsplit()
, np.vsplit()
, and np.dsplit()
can be used to split arrays along specific axes.
Advanced Joining and Splitting Techniques
Beyond basic stacking, concatenation, appending, and splitting, NumPy offers additional functions that provide more control and flexibility when manipulating array structures.
Example: HStack and VStack
Horizontal Stack (hstack
):
- Combines arrays horizontally (column-wise), aligning them side by side.
- Useful for merging features from different datasets or adding new features to an existing dataset.
Vertical Stack (vstack
):
- Combines arrays vertically (row-wise), stacking them on top of each other.
- Ideal for adding new samples or observations to an existing dataset.
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
# Horizontal stack
h_stack = np.hstack((a, b))
print("Horizontal stack:\n", h_stack)
# Vertical stack
v_stack = np.vstack((a, b))
print("\nVertical stack:\n", v_stack)
Expected output:
Horizontal stack:
[1 2 3 4 5 6]
Vertical stack:
[[1 2 3]
[4 5 6]]
Horizontal Stack (np.hstack((a, b))
):
- Concatenates arrays
a
andb
horizontally, resulting in a single 1D array containing all elements from both arrays. - Practical Use Case: Combining multiple feature vectors into a single feature set for machine learning models.
Vertical Stack (np.vstack((a, b))
):
- Stacks arrays
a
andb
vertically, creating a 2D array where each original array becomes a separate row. - Practical Use Case: Adding new data samples to an existing dataset, where each new sample is represented as a separate row.
Example: DStack
- Stacks arrays along the third axis (depth), creating a 3D array.
- Useful for combining multiple 2D arrays into a single 3D array, such as adding color channels to grayscale images.
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
# Depth stack
d_stack = np.dstack((a, b))
print("Depth stack:\n", d_stack)
Expected output:
Depth stack:
[[[1 5]
[2 6]]
[[3 7]
[4 8]]]
Explanation:
np.dstack((a, b))
combines arraysa
andb
along a new third axis, resulting in a 3D array where each "layer" corresponds to one of the original arrays.- Practical Use Case: In image processing,
dstack
can be used to add color channels (e.g., combining grayscale images into RGB images by stacking multiple channels).
Practical Applications and Considerations
Understanding how to join and split arrays is crucial for various data manipulation tasks, including:
- Data Preprocessing involves combining and splitting datasets to prepare them for training and testing machine learning models, ensuring that the data is properly formatted and ready for analysis.
- Data Augmentation creates variations of existing data, which helps in training robust machine learning models by providing diverse examples for learning.
- Feature Engineering includes the process of combining different feature sets into a single array, facilitating comprehensive analysis and enhancing the model's ability to learn relevant patterns.
- Batch Processing refers to splitting large datasets into smaller, manageable batches, which allows for efficient processing and analysis, especially when dealing with limited computational resources.
Summary Table
Operation | Method/Function | Description | Example Code | Example Output |
Stack | stack() |
Joins arrays along a new axis. | np.stack((a, b), axis=0) |
[[[1 2] [3 4]] [[5 6] [7 8]]] |
Horizontal Stack | hstack() |
Combines arrays horizontally along columns. | np.hstack((a, b)) |
[1 2 3 4 5 6] |
Vertical Stack | vstack() |
Combines arrays vertically along rows. | np.vstack((a, b)) |
[[1 2 3] [4 5 6]] |
Depth Stack | dstack() |
Stacks arrays along the third axis (depth). | np.dstack((a, b)) |
[[[1 5] [2 6]] [[3 7] [4 8]]] |
Concatenate | concatenate() |
Merges arrays along an existing axis. | np.concatenate((a, b), axis=0) |
[[1 2] [3 4] [5 6] [7 8]] |
Append | append() |
Appends values to the end of an array. | np.append(a, b) |
[1 2 3 4 5 6] |
Split | split() |
Splits an array into multiple sub-arrays. | np.split(a, [2, 4]) |
[array([1, 2]), array([3, 4]), array([5, 6])] |
Horizontal Split | hsplit() |
Splits an array into multiple sub-arrays horizontally (column-wise). | np.hsplit(a, 2) |
[array([[1], [3]]), array([[2], [4]])] |
Vertical Split | vsplit() |
Splits an array into multiple sub-arrays vertically (row-wise). | np.vsplit(a, 2) |
[array([[1, 2]]), array([[3, 4]])] |