Random Numbers

Last modified: July 27, 2024

This article is written in: 🇺🇸

Random Numbers

NumPy's random module is a powerful tool for generating random numbers from various distributions. Whether you are simulating data, implementing algorithms that require randomness, or performing statistical analysis, NumPy's random module has extensive capabilities to suit your needs.

Generating Random Floats Between 0 and 1

The function np.random.rand() produces an array of random floating-point numbers uniformly distributed over the interval $[0, 1)$ .

Function Signature:

np.random.rand(d0, d1, ..., dn)

Parameters:

$d 0, d 1, . . ., d n$ : Dimensions of the returned array.

Example:

import numpy as np

rand_array = np.random.rand(2, 3)
print(rand_array)

Expected Output:

[[0.51749304 0.05537001 0.68478923]
 [0.62190377 0.40855834 0.89849802]]

Generating Random Numbers from a Standard Normal Distribution

The function np.random.randn() returns numbers from the standard normal distribution, which has a mean of 0 and a standard deviation of 1.

Function Signature:

np.random.randn(d0, d1, ..., dn)

Example:

rand_norm_array = np.random.randn(2, 3)
print(rand_norm_array)

Expected Output:

[[-1.20108323  0.45481233 -0.45698344]
 [ 0.34275595 -1.37612312  1.23458913]]

Generating Random Integers

The function np.random.randint() generates random integers from a specified range.

Function Signature:

np.random.randint(low, high=None, size=None)

Parameters:

low is the parameter that represents the smallest integer in the range.
high is the parameter that defines the upper bound of the range, but it is exclusive, meaning the value is one above the largest possible integer.
size is the parameter that determines the shape of the output array, with the default being a single value.

Example:

rand_integers = np.random.randint(0, 10, size=5)
print(rand_integers)

Expected Output:

[6 3 8 1 9]

Generating Random Floats Over a Specified Range

The function np.random.uniform() generates random floating-point numbers over a specified range $[l o w, h i g h)$ .

Function Signature:

np.random.uniform(low=0.0, high=1.0, size=None)

Example:

rand_uniform_array = np.random.uniform(0.5, 1.5, size=(2, 3))
print(rand_uniform_array)

Expected Output:

[[1.32149298 0.64893357 1.23158464]
 [1.10294322 0.95623745 1.48312411]]

Generating Random Numbers from Other Distributions

NumPy also supports generating random numbers from other statistical distributions, such as binomial, Poisson, exponential, and many more.

Binomial Distribution

The function np.random.binomial() simulates the outcome of performing $n$ Bernoulli trials with success probability $p$ .

np.random.binomial(n, p, size=None)

Example:

rand_binomial = np.random.binomial(10, 0.5, size=5)
print(rand_binomial)

Expected Output:

[4 5 6 7 5]

Poisson Distribution

The function np.random.poisson() generates random numbers from a Poisson distribution with a given mean $λ$ .

np.random.poisson(lam, size=None)

Example:

rand_poisson = np.random.poisson(5, size=5)
print(rand_poisson)

Expected Output:

[3 4 7 2 6]

Exponential Distribution

The function np.random.exponential() generates random numbers from an exponential distribution with a specified scale parameter $β$ .

np.random.exponential(scale=1.0, size=None)

Example:

rand_exponential = np.random.exponential(1.5, size=5)
print(rand_exponential)

Expected Output:

[0.35298273 1.8726912  0.73239216 2.51090448 1.2078675 ]

Setting the Random Seed

To ensure reproducibility of random numbers, you can set the random seed using np.random.seed(). This is particularly useful for debugging or sharing code where you want others to generate the same sequence of random numbers.

Example:

np.random.seed(42)

# Generate random numbers
rand_array1 = np.random.rand(2, 3)
print(rand_array1)

# Reset seed and generate again
np.random.seed(42)
rand_array2 = np.random.rand(2, 3)
print(rand_array2)

Expected Output:

Both arrays will be identical because the seed was reset:

[[0.37454012 0.95071431 0.73199394]
 [0.59865848 0.15601864 0.15599452]]

[[0.37454012 0.95071431 0.73199394]
 [0.59865848 0.15601864 0.15599452]]

Statistics with NumPy

Statistics, at its core, is the science of collecting, analyzing, and interpreting data. It serves as a foundational pillar for fields such as data science, economics, and social sciences. A key component of statistics is understanding various distributions or, as some textbooks refer to them, populations. Central to this understanding is the idea of probability.

NumPy provides robust functions for a range of statistical operations, making it indispensable for data analysis in Python. Below, we explore some of these basic and advanced statistical operations.

Basic Statistical Measures

Mean

The mean or average of a set of values is computed by taking the sum of these values and dividing by the number of values.

$\bar{μ} = \frac{1}{N} \sum_{i = 1}^{N} x_{i}$

Function:

import numpy as np

arr = np.array([1, 2, 3, 4, 5])
mean_value = np.mean(arr)
print("Mean:", mean_value)

Median

The median is the middle value of an ordered set of values. For an odd number of values, it's the central value. For an even number of values, it's the average of the two middle values.

Function:

median_value = np.median(arr)
print("Median:", median_value)

Variance

Variance quantifies the spread or dispersion of a set of values. It's calculated as the average of the squared differences of each value from the mean.

$σ^{2} = \frac{1}{N} \sum_{i = 1}^{N} (x_{i} - \bar{x})^{2}$

Function:

variance_value = np.var(arr)
print("Variance:", variance_value)

Standard Deviation

The standard deviation measures the average distance between each data point and the mean. It's essentially the square root of variance.

$σ = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} (x_{i} - \bar{x})^{2}}$

Function:

std_deviation = np.std(arr)
print("Standard Deviation:", std_deviation)

Advanced Statistical Measures

Percentile

The percentile rank of a score is the percentage of scores in its frequency distribution that are equal to or lower than it.

Function:

percentile_50 = np.percentile(arr, 50)  # Median
print("50th Percentile (Median):", percentile_50)

percentile_90 = np.percentile(arr, 90)
print("90th Percentile:", percentile_90)

Quantile

Quantiles are values that divide a set of observations into equal parts. The 0.25 quantile is equivalent to the 25th percentile.

Function:

quantile_value = np.quantile(arr, 0.25)
print("25th Quantile:", quantile_value)

Skewness

Skewness measures the asymmetry of the probability distribution of a real-valued random variable about its mean.

Function:

from scipy.stats import skew

skewness_value = skew(arr)
print("Skewness:", skewness_value)

Kurtosis

Kurtosis measures the "tailedness" of the probability distribution of a real-valued random variable.

Function:

from scipy.stats import kurtosis

kurtosis_value = kurtosis(arr)
print("Kurtosis:", kurtosis_value)

Correlation

Correlation measures the relationship between two variables and ranges from -1 to 1.

Function:

x = np.array([1, 2, 3, 4, 5])
y = np.array([5, 4, 3, 2, 1])

correlation_matrix = np.corrcoef(x, y)
print("Correlation matrix:", correlation_matrix)

Covariance

Covariance indicates the direction of the linear relationship between variables.

Function:

covariance_matrix = np.cov(x, y)
print("Covariance matrix:", covariance_matrix)

Applying Statistics to Multidimensional Data

NumPy allows statistical operations on multidimensional data along specified axes.

Example:

data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

mean_rows = np.mean(data, axis=1)
print("Mean of each row:", mean_rows)

mean_columns = np.mean(data, axis=0)
print("Mean of each column:", mean_columns)

Example Application: Descriptive Statistics

To showcase how these statistical functions can be applied in practice, let’s calculate various descriptive statistics for a given dataset.

# Sample dataset
data = np.random.normal(0, 1, 1000)

# Mean
mean = np.mean(data)
print("Mean:", mean)

# Median
median = np.median(data)
print("Median:", median)

# Variance
variance = np.var(data)
print("Variance:", variance)

# Standard Deviation
std_dev = np.std(data)
print("Standard Deviation:", std_dev)

# 25th and 75th Percentiles
q25 = np.percentile(data, 25)
q75 = np.percentile(data, 75)
print("25th Percentile:", q25)
print("75th Percentile:", q75)

# Skewness
skewness = skew(data)
print("Skewness:", skewness)

# Kurtosis
kurt = kurtosis(data)
print("Kurtosis:", kurt)

Reference Table for Statistical Operations

Operation	Description	Formula	NumPy Function	Example Code	Expected Output
Mean	Average of values	$\bar{μ} = \frac{1}{N} \sum_{i = 1}^{N} x_{i}$	`np.mean(arr)`	`arr = np.array([1, 2, 3, 4, 5])` `np.mean(arr)`	`3.0`
Median	Middle value in an ordered set	-	`np.median(arr)`	`arr = np.array([1, 2, 3, 4, 5])` `np.median(arr)`	`3.0`
Variance	Average of squared differences from the mean	$σ^{2} = \frac{1}{N} \sum_{i = 1}^{N} (x_{i} - \bar{x})^{2}$	`np.var(arr)`	`arr = np.array([1, 2, 3, 4, 5])` `np.var(arr)`	`2.0`
Standard Deviation	Average distance of each point from the mean	$σ = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} (x_{i} - \bar{x})^{2}}$	`np.std(arr)`	`arr = np.array([1, 2, 3, 4, 5])` `np.std(arr)`	`1.4142135623730951`
Min	Smallest value	-	`np.min(arr)`	`arr = np.array([1, 2, 3, 4, 5])` `np.min(arr)`	`1`
Max	Largest value	-	`np.max(arr)`	`arr = np.array([1, 2, 3, 4, 5])` `np.max(arr)`	`5`
Range	Difference between max and min values	$range = max (x) - min (x)$	`np.ptp(arr)`	`arr = np.array([1, 2, 3, 4, 5])` `np.ptp(arr)`	`4`
Sum	Sum of all values	$\sum_{i = 1}^{N} x_{i}$	`np.sum(arr)`	`arr = np.array([1, 2, 3, 4, 5])` `np.sum(arr)`	`15`
Product	Product of all values	$\prod_{i = 1}^{N} x_{i}$	`np.prod(arr)`	`arr = np.array([1, 2, 3, 4, 5])` `np.prod(arr)`	`120`
Cumulative Sum	Cumulative sum of all values	-	`np.cumsum(arr)`	`arr = np.array([1, 2, 3, 4, 5])` `np.cumsum(arr)`	`[ 1, 3, 6, 10, 15]`
Cumulative Product	Cumulative product of all values	-	`np.cumprod(arr)`	`arr = np.array([1, 2, 3, 4, 5])` `np.cumprod(arr)`	`[ 1, 2, 6, 24, 120]`
Percentile	Value below which a percentage of data falls	-	`np.percentile(arr, q)`	`arr = np.array([1, 2, 3, 4, 5])` `np.percentile(arr, 50)`	`3.0`
Correlation Coefficient	Measure of linear relationship between arrays	-	`np.corrcoef(arr1, arr2)`	`arr1 = np.array([1, 2, 3])` `arr2 = np.array([4, 5, 6])` `np.corrcoef(arr1, arr2)`	`[[1. 1.] [1. 1.]]`
Covariance	Measure of how much two random variables vary together	-	`np.cov(arr1, arr2)`	`arr1 = np.array([1, 2, 3])` `arr2 = np.array([4, 5, 6])` `np.cov(arr1, arr2)`	`[[1. 1.] [1. 1.]]`

Random Numbers

Generating Random Floats Between 0 and 1

Generating Random Numbers from a Standard Normal Distribution

Generating Random Integers

Generating Random Floats Over a Specified Range

Generating Random Numbers from Other Distributions

Binomial Distribution

Poisson Distribution

Exponential Distribution

Setting the Random Seed

Statistics with NumPy

Basic Statistical Measures

Mean

Median

Variance

Standard Deviation

Advanced Statistical Measures

Percentile

Quantile

Skewness

Kurtosis

Correlation

Covariance

Applying Statistics to Multidimensional Data

Example Application: Descriptive Statistics

Reference Table for Statistical Operations

Table of Contents

Related Articles