Last modified: June 11, 2024

This article is written in: πŸ‡ΊπŸ‡Έ

Introduction to Distributions

A distribution is a function that describes the probability of a random variable. It helps to understand the underlying patterns and characteristics of a dataset. Distributions are widely used in statistics, data analysis, and machine learning for tasks such as hypothesis testing, confidence intervals, and predictive modeling.

Random Variables

Random variables assign numerical values to outcomes of random processes in probability and statistics. Random variables can be discrete (taking specific values) or continuous (taking any value within a range).

Example: Drawing a Card from a Deck

Example: Weather Forecast

Probability Calculations

Types of Probability Distributions

Example: Probability Distribution of a Discrete Random Variable

Consider a discrete random variable $X$ with the following probability distribution:

Value of $X$ Probability $p_X(x)$
1 0.05
2 0.10
3 0.15
4 0.20
5 0.15
6 0.10
7 0.08
8 0.07
9 0.05
10 0.05

Interpreting the Table:

This table can be visualized using a bar graph, with the height of each bar representing the likelihood of each outcome.

be5cfcd9-6dcb-48ab-80a4-c10313a0ace0

Example: Roll a Six-Sided Die Until a 6 Appears

Roll a fair six-sided die repeatedly until the die shows a 6.

Number of Rolls Probability
1 1/6 β‰ˆ 0.1667
2 (5/6) * (1/6) β‰ˆ 0.1389
3 (5/6)^2 * (1/6) β‰ˆ 0.1157
4 (5/6)^3 * (1/6) β‰ˆ 0.0964
5 (5/6)^4 * (1/6) β‰ˆ 0.0803
6 (5/6)^5 * (1/6) β‰ˆ 0.0669

bcc766bb-54d8-4005-8a06-e4de2f8b571d

Find the probability that the first 6 appears:

  1. On the third roll.
  2. On the third or fourth roll.
  3. In less than five rolls.
  4. In no more than three rolls.
  5. After three rolls.
  6. In at least three rolls.

Now let's do calculations:

  1. $P(3) = (5/6)^2 * (1/6) β‰ˆ 0.1157$
  2. $P(3 \text{ or } 4) = P(3) + P(4) β‰ˆ 0.1157 + 0.0964 β‰ˆ 0.2121$
  3. $P(X < 5) = P(1) + P(2) + P(3) + P(4) β‰ˆ 0.1667 + 0.1389 + 0.1157 + 0.0964 β‰ˆ 0.5177$
  4. $P(X \leq 3) = P(1) + P(2) + P(3) β‰ˆ 0.1667 + 0.1389 + 0.1157 β‰ˆ 0.4213$
  5. $P(X > 3) = 1 - P(X \leq 3) β‰ˆ 1 - 0.4213 β‰ˆ 0.5787$
  6. $P(X \geq 3) = P(3) + P(4) + P(5) + P(6) + ... β‰ˆ 0.1157 + 0.0964 + 0.0803 + 0.0669 + ...$

Example: Number of Pets Owned by Individuals

Consider the following probability distribution for the number of pets $P$ owned by individuals.

$P$ $P(P)$
0 0.28
1 0.35
2 0.22
3 0.10
4 0.04
5 0.01

3331851a-bd33-48fd-9f30-6e8ef54be22e

Find the probability that an individual owns:

  1. Less than 2 pets. $P(P < 2) = P(P = 0) + P(P = 1) = 0.28 + 0.35 = 0.63$
  2. More than 3 pets. $P(P > 3) = P(P = 4) + P(P = 5) = 0.04 + 0.01 = 0.05$
  3. 1 or 4 pets. $P(P = 1 \text{ or } P = 4) = P(P = 1) + P(P = 4) = 0.35 + 0.04 = 0.39$
  4. At most 3 pets. $P(P \leq 3) = P(P = 0) + P(P = 1) + P(P = 2) + P(P = 3) = 0.28 + 0.35 + 0.22 + 0.10 = 0.95$
  5. 2 or fewer, or more than 4 pets. $P(P \leq 2 \text{ or } P > 4) = P(P \leq 2) + P(P > 4) = (0.28 + 0.35 + 0.22) + 0.01 = 0.86$

Expected Value

Calculating Expected Value

Interpretation

Example: Expected Value in a Dice Roll

Consider a fair six-sided dice roll. Each side, numbered from 1 to 6, has an equal probability of appearing on a single roll. The probability for each outcome is $\frac{1}{6}$.

Step-by-Step Calculation of Expected Value:

I. List All Possible Outcomes and Their Probabilities.

Outcome (X) Probability $P(X)$
1 1/6
2 1/6
3 1/6
4 1/6
5 1/6
6 1/6

II. Multiply Each Outcome by Its Probability.

III. Sum Up the Products.

$$ E(X) = (1 \times \frac{1}{6}) + (2 \times \frac{1}{6}) + (3 \times \frac{1}{6}) + (4 \times \frac{1}{6}) + (5 \times \frac{1}{6}) + (6 \times \frac{1}{6}) $$

$$E (X) = \frac{1 + 2 + 3 + 4 + 5 + 6}{6} = \frac{21}{6} $$

$$ E(X) = 3.5 $$

IV. Interpretation:

Probability Density Function (PDF) - Continuous Variables

For continuous random variables, the PDF provides the probability density at a specific point $x$. The area under the curve between two points on the PDF represents the probability of the variable falling within that range.

$$f_X(x)$$

Properties:

  1. Non-negative: $f_X(x) \geq 0$ for all $x$.
  2. Normalization: The total area under the curve of the PDF is 1.

Joint PDF for Multiple Variables: For two continuous random variables $X$ and $Y$, the joint PDF $f_{X,Y}(x, y)$ gives the density at a particular point $(x, y)$.

daee62ad-7315-4736-b786-bb7cafe700e1

9ebbf08c-de34-44c8-b0f3-5632b0faa7e5

Probability Mass Function (PMF) - Discrete Variables

For discrete random variables, the PMF specifies the probability of the variable taking a particular value $x$. Directly find the probability of specific outcomes.

$$ p_X(x) = P(X = x) $$

Properties:

  1. Non-negative: $p_X(x) \geq 0$ for all $x$.
  2. Sum to One: The sum of all probabilities for all possible values of $X$ is 1.

Joint PMF for Multiple Variables: For two discrete random variables $X$ and $Y$, the joint PMF $p_{X,Y}(x, y)$ gives the probability of $X$ and $Y$ simultaneously taking values $x$ and $y$, respectively.

d20b90d0-5240-4f0a-88f5-dfc50cf8e39d

31d2a302-cacc-472a-a5cc-83e0a640e4d3

Cumulative Distribution Function (CDF) - Both Continuous and Discrete Variables

The CDF shows the probability that a random variable is less than or equal to a specific value $x$. Calculate the probability of the variable falling below a certain threshold.

$$ F_X(x) = P(X \leq x) $$

Properties:

  1. Non-decreasing function.
  2. $\lim_{x \to \infty} F_X(x) = 1$.
  3. $\lim_{x \to -\infty} F_X(x) = 0$.
  4. Right-continuous: For any $x$ and decreasing sequence $x_n$ converging to $x$, $\lim_{x_n \to x^+} F_X(x_n) = F_X(x)$.

Joint CDF: For two variables $X$ and $Y$, $F(a, b) = P(X \leq a, Y \leq b)$. To derive the marginal distribution of $X$: $F_X(a) = \lim_{b \to \infty} F(a, b)$.

1de74524-d5fe-4c04-909f-0cbb6d9ebed7

36f05abf-38ae-49c3-898a-18c34edef19d

Moments and Moment Generating Functions

Moments are key statistical measures that provide insights into the characteristics of a distribution, such as its central tendency, dispersion, and overall shape. Specifically, the $n$th moment of a random variable $X$ around a constant $c$ is defined as the expected value of the $n$th power of the deviation of $X$ from $c$:

$$ E[(X - c)^n] $$

Where $E[\cdot]$ denotes the expected value.

The Moment-Generating Function (MGF) is a powerful tool in the analysis of random variables. For a random variable $X$, the MGF is a function that encapsulates all the moments of $X$. It is defined as the expected value of $e^{tX}$, where $t$ is a real number:

$$ M_X(t) = E[e^{tX}] $$

One of the key properties of the MGF is its ability to generate moments. Specifically, the $n$th moment about the origin is obtained by differentiating the MGF $n$ times with respect to $t$ and then evaluating it at $t=0$:

I. Mean (First Moment): The mean or the first moment of $X$ is the expected value of $X$, denoted as $\mu$. It is derived from the first derivative of the MGF at $t=0$:

$$ E[X] = \mu = M_X^{(1)}(0) $$

II. Variance (Second Moment): The variance measures the dispersion of the random variable around its mean. It is the second central moment, and it can be derived from the MGF as follows:

$$ Var(X) = \sigma^2 = E[X^2] - (E[X])^2 = M_X^{(2)}(0) - \left( M_X^{(1)}(0) \right)^2 $$

Here, $M_X^{(1)}(0)$ and $M_X^{(2)}(0)$ represent the first and second derivatives of the MGF evaluated at $t=0$, respectively.

Table of Contents

  1. Introduction to Distributions
  2. Random Variables
    1. Example: Drawing a Card from a Deck
    2. Example: Weather Forecast
    3. Probability Calculations
  3. Types of Probability Distributions
    1. Example: Probability Distribution of a Discrete Random Variable
    2. Example: Roll a Six-Sided Die Until a 6 Appears
    3. Example: Number of Pets Owned by Individuals
  4. Expected Value
    1. Calculating Expected Value
    2. Interpretation
    3. Example: Expected Value in a Dice Roll
  5. Probability Density Function (PDF) - Continuous Variables
  6. Probability Mass Function (PMF) - Discrete Variables
  7. Cumulative Distribution Function (CDF) - Both Continuous and Discrete Variables
  8. Moments and Moment Generating Functions