*Last modified: June 11, 2024*

*This article is written in: πΊπΈ*

## Introduction to Distributions

A distribution is a function that describes the probability of a random variable. It helps to understand the underlying patterns and characteristics of a dataset. Distributions are widely used in statistics, data analysis, and machine learning for tasks such as hypothesis testing, confidence intervals, and predictive modeling.

## Random Variables

Random variables assign numerical values to outcomes of random processes in probability and statistics. Random variables can be **discrete** (taking specific values) or **continuous** (taking any value within a range).

### Example: Drawing a Card from a Deck

**Suit of the Card (Discrete)**: Assigns a category based on the suit (e.g., hearts, spades).**Rank of the Card (Discrete)**: Numerical value or face (e.g., 2, 10, King).**Color of the Card (Discrete)**: Red or Black.

### Example: Weather Forecast

**Temperature (Continuous)**: The forecasted temperature in degrees.**Chance of Precipitation (Continuous)**: Probability of rain or snow, expressed in percentage.**Wind Speed (Continuous)**: Speed of the wind in kilometers or miles per hour.

### Probability Calculations

**Discrete Example (Card Drawing)**: If X represents the suit of a card, $P(X = Hearts)$ is the probability of drawing a heart.**Continuous Example (Temperature)**: If Y represents temperature, $P(Y > 20Β°C)$ is the chance that the temperature is above 20 degrees Celsius.**General Probability Notations**: Calculate probabilities like $P(X < x), P(X β€ x), P(X > x), P(X β₯ x)$, where 'x' is a specific value.

## Types of Probability Distributions

**Probability Distribution**: A mathematical description of the likelihood of different outcomes in an experiment or process.**Discrete Probability Distributions**: Used for discrete variables (e.g., counting outcomes like the number of heads in coin flips).**Continuous Probability Distributions**: Used for continuous variables (e.g., measurements like height or weight).

### Example: Probability Distribution of a Discrete Random Variable

Consider a discrete random variable $X$ with the following probability distribution:

Value of $X$ | Probability $p_X(x)$ |

1 | 0.05 |

2 | 0.10 |

3 | 0.15 |

4 | 0.20 |

5 | 0.15 |

6 | 0.10 |

7 | 0.08 |

8 | 0.07 |

9 | 0.05 |

10 | 0.05 |

Interpreting the Table:

**Higher Probability Values**: The value 4 has the highest probability (0.20), suggesting that it is the most likely outcome.**Comparing Probabilities**: The probability of getting a 4 is higher than getting a 10, as indicated by their respective probabilities (0.20 vs. 0.05).**Sum of Probabilities**: The sum of all these probabilities equals 1, confirming that the table represents a complete probability distribution.

This table can be visualized using a bar graph, with the height of each bar representing the likelihood of each outcome.

### Example: Roll a Six-Sided Die Until a 6 Appears

Roll a fair six-sided die repeatedly until the die shows a 6.

Number of Rolls | Probability |

1 | 1/6 β 0.1667 |

2 | (5/6) * (1/6) β 0.1389 |

3 | (5/6)^2 * (1/6) β 0.1157 |

4 | (5/6)^3 * (1/6) β 0.0964 |

5 | (5/6)^4 * (1/6) β 0.0803 |

6 | (5/6)^5 * (1/6) β 0.0669 |

Find the probability that the first 6 appears:

- On the third roll.
- On the third or fourth roll.
- In less than five rolls.
- In no more than three rolls.
- After three rolls.
- In at least three rolls.

Now let's do calculations:

- $P(3) = (5/6)^2 * (1/6) β 0.1157$
- $P(3 \text{ or } 4) = P(3) + P(4) β 0.1157 + 0.0964 β 0.2121$
- $P(X < 5) = P(1) + P(2) + P(3) + P(4) β 0.1667 + 0.1389 + 0.1157 + 0.0964 β 0.5177$
- $P(X \leq 3) = P(1) + P(2) + P(3) β 0.1667 + 0.1389 + 0.1157 β 0.4213$
- $P(X > 3) = 1 - P(X \leq 3) β 1 - 0.4213 β 0.5787$
- $P(X \geq 3) = P(3) + P(4) + P(5) + P(6) + ... β 0.1157 + 0.0964 + 0.0803 + 0.0669 + ...$

### Example: Number of Pets Owned by Individuals

Consider the following probability distribution for the number of pets $P$ owned by individuals.

$P$ | $P(P)$ |

0 | 0.28 |

1 | 0.35 |

2 | 0.22 |

3 | 0.10 |

4 | 0.04 |

5 | 0.01 |

Find the probability that an individual owns:

- Less than 2 pets. $P(P < 2) = P(P = 0) + P(P = 1) = 0.28 + 0.35 = 0.63$
- More than 3 pets. $P(P > 3) = P(P = 4) + P(P = 5) = 0.04 + 0.01 = 0.05$
- 1 or 4 pets. $P(P = 1 \text{ or } P = 4) = P(P = 1) + P(P = 4) = 0.35 + 0.04 = 0.39$
- At most 3 pets. $P(P \leq 3) = P(P = 0) + P(P = 1) + P(P = 2) + P(P = 3) = 0.28 + 0.35 + 0.22 + 0.10 = 0.95$
- 2 or fewer, or more than 4 pets. $P(P \leq 2 \text{ or } P > 4) = P(P \leq 2) + P(P > 4) = (0.28 + 0.35 + 0.22) + 0.01 = 0.86$

## Expected Value

- The expected value (often denoted as $E(X)$ or $\mu$) is a fundamental concept in probability, representing the average or mean value of a random variable over a large number of trials or observations.
- It is calculated as a weighted average of all possible values, with weights being their respective probabilities.
- The expected value alone might not be sufficient to understand a distribution fully, especially if the distribution is skewed or has heavy tails.

### Calculating Expected Value

- For a discrete random variable: $E(X) = \sum [x_i \times P(x_i)]$, where $x_i$ are the possible values and $P(x_i)$ their probabilities.
- For a continuous random variable, it involves integrating the product of the variable's value and its probability density function.

### Interpretation

- The expected value provides a measure of the 'center' of a probability distribution.
- It does not necessarily correspond to the most probable value but is a long-run average if an experiment is repeated many times.

### Example: Expected Value in a Dice Roll

Consider a fair six-sided dice roll. Each side, numbered from 1 to 6, has an equal probability of appearing on a single roll. The probability for each outcome is $\frac{1}{6}$.

Step-by-Step Calculation of Expected Value:

I. List All Possible Outcomes and Their Probabilities.

Outcome (X) | Probability $P(X)$ |

1 | 1/6 |

2 | 1/6 |

3 | 1/6 |

4 | 1/6 |

5 | 1/6 |

6 | 1/6 |

II. Multiply Each Outcome by Its Probability.

- For 1: $1 \times \frac{1}{6}$
- For 2: $2 \times \frac{1}{6}$
- For 3: $3 \times \frac{1}{6}$
- For 4: $4 \times \frac{1}{6}$
- For 5: $5 \times \frac{1}{6}$
- For 6: $6 \times \frac{1}{6}$

III. Sum Up the Products.

$$ E(X) = (1 \times \frac{1}{6}) + (2 \times \frac{1}{6}) + (3 \times \frac{1}{6}) + (4 \times \frac{1}{6}) + (5 \times \frac{1}{6}) + (6 \times \frac{1}{6}) $$

$$E (X) = \frac{1 + 2 + 3 + 4 + 5 + 6}{6} = \frac{21}{6} $$

$$ E(X) = 3.5 $$

IV. Interpretation:

- The expected value of $E(X) = 3.5$ suggests that over a large number of dice rolls, the average value of the outcomes will converge to 3.5.
- It's important to note that while 3.5 is not a possible outcome of a single roll, it represents the long-term average or the 'center' of the distribution of outcomes.
- This concept is a cornerstone in probability theory, providing a predictive measure of the behavior of a random variable over many trials.

## Probability Density Function (PDF) - Continuous Variables

For continuous random variables, the PDF provides the probability density at a specific point $x$. The area under the curve between two points on the PDF represents the probability of the variable falling within that range.

$$f_X(x)$$

Properties:

- Non-negative: $f_X(x) \geq 0$ for all $x$.
- Normalization: The total area under the curve of the PDF is 1.

**Joint PDF for Multiple Variables**: For two continuous random variables $X$ and $Y$, the joint PDF $f_{X,Y}(x, y)$ gives the density at a particular point $(x, y)$.

## Probability Mass Function (PMF) - Discrete Variables

For discrete random variables, the PMF specifies the probability of the variable taking a particular value $x$. Directly find the probability of specific outcomes.

$$ p_X(x) = P(X = x) $$

Properties:

- Non-negative: $p_X(x) \geq 0$ for all $x$.
- Sum to One: The sum of all probabilities for all possible values of $X$ is 1.

**Joint PMF for Multiple Variables**: For two discrete random variables $X$ and $Y$, the joint PMF $p_{X,Y}(x, y)$ gives the probability of $X$ and $Y$ simultaneously taking values $x$ and $y$, respectively.

## Cumulative Distribution Function (CDF) - Both Continuous and Discrete Variables

The CDF shows the probability that a random variable is less than or equal to a specific value $x$. Calculate the probability of the variable falling below a certain threshold.

$$ F_X(x) = P(X \leq x) $$

Properties:

- Non-decreasing function.
- $\lim_{x \to \infty} F_X(x) = 1$.
- $\lim_{x \to -\infty} F_X(x) = 0$.
- Right-continuous: For any $x$ and decreasing sequence $x_n$ converging to $x$, $\lim_{x_n \to x^+} F_X(x_n) = F_X(x)$.

**Joint CDF**: For two variables $X$ and $Y$, $F(a, b) = P(X \leq a, Y \leq b)$. To derive the marginal distribution of $X$: $F_X(a) = \lim_{b \to \infty} F(a, b)$.

## Moments and Moment Generating Functions

Moments are key statistical measures that provide insights into the characteristics of a distribution, such as its central tendency, dispersion, and overall shape. Specifically, the $n$th moment of a random variable $X$ around a constant $c$ is defined as the expected value of the $n$th power of the deviation of $X$ from $c$:

$$ E[(X - c)^n] $$

Where $E[\cdot]$ denotes the expected value.

The Moment-Generating Function (MGF) is a powerful tool in the analysis of random variables. For a random variable $X$, the MGF is a function that encapsulates all the moments of $X$. It is defined as the expected value of $e^{tX}$, where $t$ is a real number:

$$ M_X(t) = E[e^{tX}] $$

One of the key properties of the MGF is its ability to generate moments. Specifically, the $n$th moment about the origin is obtained by differentiating the MGF $n$ times with respect to $t$ and then evaluating it at $t=0$:

I. **Mean (First Moment):** The mean or the first moment of $X$ is the expected value of $X$, denoted as $\mu$. It is derived from the first derivative of the MGF at $t=0$:

$$ E[X] = \mu = M_X^{(1)}(0) $$

II. **Variance (Second Moment):** The variance measures the dispersion of the random variable around its mean. It is the second central moment, and it can be derived from the MGF as follows:

$$ Var(X) = \sigma^2 = E[X^2] - (E[X])^2 = M_X^{(2)}(0) - \left( M_X^{(1)}(0) \right)^2 $$

Here, $M_X^{(1)}(0)$ and $M_X^{(2)}(0)$ represent the first and second derivatives of the MGF evaluated at $t=0$, respectively.