Last modified: May 19, 2025
This article is written in: đşđ¸
Vectors
A vector is a mathematical entity characterized by both magnitude and direction. Vectors are essential in various fields such as linear algebra, calculus, physics, computer science, data analysis, and machine learning. In the context of NumPy, vectors are represented as one-dimensional arrays, enabling efficient computation and manipulation. This guide delves into the definition of vectors, their properties, and the operations that can be performed on them using NumPy, complemented by practical examples to illustrate each concept.
Definitions
A vector space over a field $\mathbb{F}$ (here the reals $\mathbb{R}$) is a set equipped with vector addition and scalar multiplication that satisfy eight axioms (closure, associativity, identity, inverses, distributive laws, etc.). The canonical example is the nâdimensional real coordinate space $\mathbb{R}^n$.
Vector in $\mathbb{R}^n$
Formal definition. An element $\mathbf v \in \mathbb{R}^n$ is an ordered $n$-tuple of real numbers
$$ \mathbf v = (v_1,\dots,v_n) \equiv \sum_{i=1}^n v_i\mathbf e_i $$
where ${\mathbf e_i}_{i=1}^n$ is the standard basis with $\mathbf e_i$ having a 1 in the $i$-th position and zeros elsewhere.
A vector encodes magnitude and direction relative to the origin. In dataâscience terms, it stores the feature values of one sample.
NumPy quickâstart.
import numpy as np
v = np.array([4, -2, 7]) # element of R^3
type(v), v.shape # (numpy.ndarray, (3,))
Row vs Column Representation
Vectors can be written in two orientations â row vectors and column vectorsâeach serving different roles in computations. The choice of orientation determines how vectors interact with matrices and with each other.
Row Vector
A row vector is a $1 \times n$ matrix, meaning it has one row and $n$ columns. Its elements are laid out horizontally:
$$ v = \begin{bmatrix} v_1 & v_2 & \cdots & v_n \end{bmatrix} $$
- It has shape $(1, n)$.
- Often used to represent a single data sample with $n$ features (e.g., one row of a dataset).
- Can multiply on the right by an $n \times m$ matrix $A$, yielding another row vector of shape $(1, m)$:
$$\vec v_{\text{row}}A \in \mathbb{R}^{1 \times m}$$
Example:
$$ \vec v_{\text{row}} = [1,2,3] \quad\text{is a }1\times3\text{ row vector in }\mathbb{R}^3. $$
Column Vector
A column vector is an $n \times 1$ matrix, with elements displayed vertically:
$$ v = \begin{bmatrix} v_1\\ v_2\\ \vdots\\ v_n \end{bmatrix} $$
- It has shape $(n,1)$.
- *Central to linear transformations; matrices act on column vectors from the left.
- Multiplying an $m \times n$ matrix $A$ by a column vector yields another column vector of shape $(m,1)$:
$$A\vec v_{\text{col}} \in \mathbb{R}^{m \times 1}$$
Example:
$$ v = \begin{bmatrix} 1\\ 2\\ 3 \end{bmatrix} $$
$$ \text{is a }3\times1\text{ column vector in }\mathbb{R}^3. $$
Transpose
The transpose operation switches between row and column orientation:
Denoted by a superscript â$^T$â:
$$\vec v_{\text{row}}^T = \vec v_{\text{col}}$$
and
$$\vec v_{\text{col}}^T = \vec v_{\text{row}}$$
If $v$ is a matrix (or vector) with entries $v_{ij}$, then
$$v^T_{ij} = v_{ji}$$
Why Transpose Matters:
- Ensures dimensions match: you can only multiply a $(1,n)$ by an $(n,1)$ or an $(n,1)$ by a $(1,n)$, etc.
- In more advanced settings (e.g., covariance matrices, orthogonal matrices), transpose plays a key role in defining symmetric and orthogonal properties.
- Dot Product:
$$\vec u \cdot \vec v = \vec u_{\text{row}}\vec v_{\text{col}} = \sum_i u_iv_i$$
Example of Transpose:
$$ v = \begin{bmatrix} 1 & 2 & 3 \end{bmatrix} $$
$$ v^T = \begin{bmatrix} 1\\ 2\\ 3 \end{bmatrix} $$
Norms and Length
A norm $||\cdot||$ is a function that assigns a non-negative âlengthâ or âsizeâ to each vector in a vector space, satisfying three core properties:
Positivity:
$$||\vec v|| \ge 0$$
for all $\vec v$, and
$$||\vec v|| = 0$$
if and only if $\vec v$ is the zero vector.
Homogeneity (Scalability):
$$||\alpha \vec v|| = |\alpha|||\vec v||$$
for any scalar $\alpha$.
Triangle Inequality:
$$||\vec u + \vec v|| \le ||\vec u|| + ||\vec v||$$
for any vectors
$$\vec u, \vec v$$
The p-norm (or $L^p$ norm) is a family of norms parameterized by $p \ge 1$, defined for a vector
$$\vec v = (v_1, v_2, \ldots, v_n)$$
as
$$ \lVert \vec v \rVert_p = \left( \sum_{i=1}^{n} \lvert v_i \rvert^p \right)^{1/p} $$
- When $p=1$, this reduces to the L1 norm, the sum of absolute values.
- When $p=2$, it gives the familiar Euclidean norm.
- As $p \to \infty$, it approaches the maximum absolute component.
Why the p-Norm Matters
- By tuning $p$, you emphasize different aspects of the data (e.g., outliers vs.\ aggregate magnitude).
- In machine learning, different norms encourage different solution structures (e.g., sparsity with L1, smoothness with L2).
- The shape of the âballâ ${\vec v : ||\vec v||_p \le 1}$ changes with $p$, affecting feasible regions in optimization.
Common Special Cases:
$p$ | Name | Unit-Ball in $\mathbb{R}^2$ | Geometric Intuition |
1 | Manhattan | Diamond (rotated square) $\diamond$ | Distance measured along axes (like city blocks) |
2 | Euclidean | Circle $\bigcirc$ | âStraight-lineâ distance in the plane |
$\infty$ | Chebyshev | Axis-aligned square $\square$ | Maximum coordinate difference (chess-king moves) |
Unit-radius sketches:
NumPyâs linalg.norm
function makes it easy:
import numpy as np
from numpy.linalg import norm
v = np.array([v1, v2, ..., vn])
# L1 norm: sum of absolute values
l1 = norm(v, ord=1)
# L2 norm: Euclidean length (default)
l2 = norm(v) # same as norm(v, ord=2)
# Infinity norm: maximum absolute component
linf = norm(v, ord=np.inf)
print(f"L1: {l1}, L2: {l2}, Lâ: {linf}")
ord=1
computes $\sum_i |v_i|$.ord=2
(or default) computes $\sqrt{\sum_i v_i^2}$.ord=np.inf
computes $\max_i |v_i|$.
Why Norms Matter in Practice:
Similarity and Distance
In algorithms like k-Nearest Neighbors (k-NN), the choice of norm directly affects which points are deemed âclosest,â altering classification or regression results.
Optimization and Regularization
- L1 regularization ($\ell_1$ penalty) tends to produce sparse solutions (many zero coefficients).
- L2 regularization ($\ell_2$ penalty) tends to spread error evenly among parameters, leading to smaller overall weights.
Feasible Regions
When you enforce a norm constraint (e.g., $||x||_p \le 1$), the shape of that feasible set changes with $p$, influencing which solutions are accessible in constrained optimization.
Vector Operations
Vector addition
For $\mathbf{u},\mathbf{v}\in\mathbb{R}^n$ the sum is
$$ \mathbf{u}+\mathbf{v}= \bigl(u_1+v_1,u_2+v_2,\dots,u_n+v_n\bigr). $$
- Commutativeâ$\mathbf{u}+\mathbf{v}=\mathbf{v}+\mathbf{u}$
- Associativeâ$(\mathbf{u}+\mathbf{v})+\mathbf{w}=\mathbf{u}+(\mathbf{v}+\mathbf{w})$
- Identity elementâ$\mathbf{0}$ (all zeros)
import numpy as np
a = np.array([9, 2, 5])
b = np.array([-3, 8, 2])
res = np.add(a, b) # or simply a + b
print(res) # â [ 6 10 7]
Complexity. $O(n)$ arithmetic operations; NumPy runs this in native C, so it is vectorised and avoids Python loops.
Typical uses.
- Merging feature vectors from multiple sensors or modalities
- Displacement composition in kinematics
- Gradient accumulation in machine-learning optimisers
Scalar (outer) multiplication
Given a scalar $\alpha\in\mathbb{R}$ and $\mathbf{u}\in\mathbb{R}^n$,
$$ \alpha\mathbf{u}= \bigl(\alpha u_1,\alpha u_2,\dots,\alpha u_n\bigr). $$
Multiplies the magnitude by $|\alpha|$; for negative $\alpha$ the direction is flipped (180° rotation).
v = np.array([6, 3, 4])
alpha = 2
scaled = alpha * v # element-wise; same as np.multiply(alpha, v)
print(scaled) # â [12 6 8]
Distributive law.
$$\alpha(\mathbf{u}+\mathbf{v})=\alpha\mathbf{u}+\alpha\mathbf{v}$$
Useful for normalising vectors to unit length: u / np.linalg.norm(u)
.
Dot (inner) product
Definition.
$$ \mathbf{u}\cdot\mathbf{v}= \sum_{i=1}^{n}u_i v_i. $$
Geometry.
$$ \mathbf{u}\cdot\mathbf{v}=\lVert\mathbf{u}\rVert_2\lVert\mathbf{v}\rVert_2 \cos\theta, $$
so it captures both magnitudes and their relative orientation $\theta$.
u = np.array([9, 2, 5])
v = np.array([-3, 8, 2])
dp = np.dot(u, v) # or u @ v in NumPy âĽ1.10
print(dp) # â -1
An output of zero indicates orthogonality.
Negative values imply an angle greater than 90°, explaining the $-1$ above (â90.6°).
- Cosine-similarity search in recommender systems
- Work done by a force along a displacement ($W=\mathbf{F}\cdot\mathbf{s}$)
- Projection of one vector onto another:
$$\displaystyle \mathrm{proj}_{\mathbf{v}}(\mathbf{u}) = \frac{\mathbf{u}\cdot\mathbf{v}}{\lVert\mathbf{v}\rVert_2^2}\mathbf{v}$$
Cross product
For $\mathbf{u},\mathbf{v}\in\mathbb{R}^3$,
$$ \mathbf{u}\times\mathbf{v} = \begin{vmatrix} \mathbf{i} & \mathbf{j} & \mathbf{k}\\ u_1 & u_2 & u_3\\ v_1 & v_2 & v_3 \end{vmatrix} = \bigl(u_2v_3-u_3v_2, u_3v_1-u_1v_3, u_1v_2-u_2v_1\bigr). $$
The resulting vector is perpendicular to the input pair; its magnitude equals the area of the parallelogram spanned by $\mathbf{u}$ and $\mathbf{v}$.
u = np.array([9, 2, 5])
v = np.array([-3, 8, 2])
c = np.cross(u, v)
print(c) # â [-36 -33 78]
Use the right-hand rule to fix the orientation: curling your fingers from u to v, your thumb points along u Ă v.
- Surface normals in graphics shaders (lighting)
- Torque $\boldsymbol{\tau} = \mathbf{r}\times\mathbf{F}$
- Angular momentum $\mathbf{L} = \mathbf{r}\times m\mathbf{v}$
Angle between two vectors
From the dot-product identity above:
$$ \theta = \arccos!\Bigl(\frac{\mathbf{u}\cdot\mathbf{v}} {\lVert\mathbf{u}\rVert_2\lVert\mathbf{v}\rVert_2}\Bigr), \qquad 0\le\theta\le\pi. $$
u = np.array([9, 2, 5])
v = np.array([-3, 8, 2])
cosθ = np.dot(u, v) / (np.linalg.norm(u) * np.linalg.norm(v))
cosθ = np.clip(cosθ, -1.0, 1.0) # guards against tiny FP overshoots
θ_rad = np.arccos(cosθ)
θ_deg = np.degrees(θ_rad)
print(θ_rad) # â 1.5817 rad
print(θ_deg) # â 90.62 °
Edge cases to watch.
- If either vector is the zero vector, the angle is undefined (division-by-zero).
- Numerical rounding can nudge the cosine slightly outside $[-1,1]$;
np.clip
preventsnan
.
Broadcasting
What broadcasting meansâformally
For any binary ufunc $f$ (e.g., +
, *
, np.maximum
), NumPy will try to apply
$$ C = f(A, B) $$
element-wise if and only if the two input shapes are broadcast-compatible. Compatibility is checked right-to-left over the axes:
Equal length rule.
Pad the shorter shape on the left with 1âs so both shapes have the same rank.
Axis match rule.
For every axis $k$ from the last to the first
- either $A_k = B_k$, or
- one of them equals 1 (that axis will be stretched).
When axis $k$ is stretched, NumPy does not copy data; it creates a strided view that repeats the existing bytes in memoryâso the cost is $O(1)$ extra space.
Tip. Think of a dimension of length 1 as a wildcard that can masquerade as any size.
Scalar broadcasting
import numpy as np
arr = np.array([1, 2, 3, 4]) # shape (4,)
alpha = 2 # shape () â rank-0
print("arr + alpha:", arr + alpha) # [3 4 5 6]
print("arr * alpha:", arr * alpha) # [2 4 6 8]
The scalar behaves like an invisible array of shape (4,)
here.
Common uses:
- Feature scaling / centering:
X -= X.mean(axis=0)
subtracts the row-vector of feature means from every sample at once. - Softmax trick:
logits - logits.max(axis=1, keepdims=True)
prevents overflow by broadcasting a column vector of maxima.
Vectorâmatrix examples
M = np.arange(12).reshape(3, 4) # shape (3,4)
col = np.array([10, 20, 30])[:,None] # shape (3,1)
row = np.array([1, 2, 3, 4]) # shape (4,)
print("M + col â\n", M + col) # each row shifted by its col entry
print("M + row â\n", M + row) # each column shifted by row entry
Shape algebra (after left-padding):
Operand | Raw shape | Padded to (3, 4) | Compatible? |
M |
(3, 4) | (3, 4) | â |
col |
(3, 1) | (3, 1) | â (second axis 1) |
row |
(4,) | (1, 4) | â (first axis 1) |
The result is shape (3, 4)
in both casesâno materialised tile of col
or row
.
When broadcasting fails
a = np.empty((5, 4))
b = np.empty((3, 1, 4))
# a + b -> ValueError: operands could not be broadcast together ...
Reason: after padding, shapes are (1,5,4)
and (3,1,4)
; axis 0 demands 1 vs 3 (neither is 1), so rule 2 fails.
Performance notes
- Aliasing hazards.
out[:] += x
is safe;out = out + x
makes a new array instead of updating in-place. - Cache friendliness. Broadcasting keeps the contiguous memory layout of the larger operand; explicit
np.tile
often degrades performance and uses $O(nm)$ extra RAM. - Higher-order views. Use
np.expand_dims
orNone
([:, None]
) to add axes consciously and avoid accidental shape mismatches.
Practical Applications
Vectors and their operations are integral to numerous practical applications across various domains. Mastering these concepts enables efficient data manipulation, analysis, and the implementation of complex algorithms.
Accessing and Modifying Multiple Elements
Beyond single-element access, vectors allow for the manipulation of multiple elements simultaneously using slicing or advanced indexing. This capability is essential for batch processing and data transformation tasks.
# Creating a 1D array
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
# Modifying multiple elements
arr[2:5] = [10, 11, 12]
print(arr)
Expected output:
[ 1 2 10 11 12 6 7 8]
arr[2:5] = [10, 11, 12]
assigns the values10
,11
, and12
to the elements at indices2
,3
, and4
, respectively.- The original array
[1, 2, 3, 4, 5, 6, 7, 8]
is updated to[1, 2, 10, 11, 12, 6, 7, 8]
. - Batch updating is useful in data cleaning processes where multiple data points need correction or transformation, such as replacing outliers or applying scaling factors to specific sections of a dataset.
Boolean Indexing
Boolean indexing enables the selection of elements based on conditional statements, allowing for dynamic and flexible data selection without the need for explicit loops. This technique is highly efficient and widely used in data analysis.
# Creating a 1D array
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
# Boolean indexing
bool_idx = arr > 5
print(arr[bool_idx])
Expected output:
[6 7 8]
arr > 5
creates a boolean array[False, False, False, False, False, True, True, True]
.arr[bool_idx]
uses this boolean array to filter and retrieve elements where the conditionarr > 5
isTrue
, resulting in[6, 7, 8]
.- Boolean indexing is used to filter datasets based on specific criteria, such as selecting all records where a sales figure exceeds a certain threshold or extracting all entries that meet particular quality standards.
Summary Table
All examples are self-containedâeach row declares the minimal variables it needsâso you can copy-paste any cell directly into any IDE.
Operation | Description & Formula | Example Code | Expected Output (shape) |
Vector Addition | Element-wise sum â$c_i = a_i + b_i$ |
arr_1 = np.array([9, 2, 5]) arr_2 = np.array([-3, 8, 2]) np.add(arr_1, arr_2) |
[ 6 10 7]â(3,) |
Scalar Multiplication | Scale a vector â$c_i = k a_i$ |
scalar = 2 arr = np.array([6, 3, 4]) scalar * arr |
[12 6 8]â(3,) |
Dot Product | Projection / cosine similarity â$a \cdot b = \sum_i a_i b_i$ |
arr_1 = np.array([9, 2, 5]) arr_2 = np.array([-3, 8, 2]) np.dot(arr_1, arr_2) |
-1â() |
Cross Product | 3-D vector orthogonal to both inputs â$a \times b$ |
arr_1 = np.array([9, 2, 5]) arr_2 = np.array([-3, 8, 2]) np.cross(arr_1, arr_2) |
[-36 -33 78]â(3,) |
Angle Between Vectors | $\theta = \arccos!\left(\frac{a\cdot b}{\lVert a\rVert\,\lVert b\rVert}\right)$ | arr_1 = np.array([9, 2, 5]) arr_2 = np.array([-3, 8, 2]) angle = np.arccos(np.dot(arr_1, arr_2) / (np.linalg.norm(arr_1)*np.linalg.norm(arr_2))) np.round(angle, 3) |
1.582 rad |
Broadcasting | NumPy automatically âstretchesâ smaller shapes so element-wise ops make sense. (vector â scalar shown here) |
arr = np.array([1, 2, 3, 4]) scalar = 2 arr + scalar, arr * scalar |
([3 4 5 6], [2 4 6 8]) |
Tiny Performance Tips:
- Vectorized > loops â every row above is a single, optimized C call.
np.dot
& BLAS â use contiguousfloat64
arrays for best throughput.- Broadcast with care â repeated implicit copies are virtual, but an unexpected
np.copy()
downstream can explode memory; checkarr.strides
.