Statistics Notes

Introduction to Distributions 🇺🇸

May 11, 2025

A distribution is a function that describes the probability of a random variable. It helps to understand the underlying patterns and characteristics of a dataset. Distributions are widely used in statistics, data analysis, and machine learning for tasks such as hypothesis testing, confidence interva...

Statistical Moments 🇺🇸

May 11, 2025

Category: Statistics Notes

In both statistics and mechanics the word moment measures how much "leverage" the values of a quantity exert about a chosen reference point. In statistics the leverage is exerted by probability mass, in mechanics by physical mass, but the mathematics is identical: take a distance from the reference ...

Simple Linear Regression 🇺🇸

May 06, 2025

Category: Statistics Notes

Simple linear regression is a statistical method used to model the relationship between a single dependent variable and one independent variable. It aims to find the best-fitting straight line through the data points, which can be used to predict the dependent variable based on the independent varia...

Yule Walker Equations 🇺🇸

May 04, 2025

Category: Statistics Notes

The Yule-Walker equations are a set of linear relationships that tie the autocovariances/autocorrelations of a stationary autoregressive (AR $p$) process to its parameters. They are the work-horse for parameter estimation, diagnostic checking, and theoretical analysis of AR models...

Bayesian vs Frequentist 🇺🇸

May 03, 2025

Category: Statistics Notes

Bayesian and frequentist statistics are two distinct approaches to statistical inference. Both approaches aim to make inferences about an underlying population based on sample data. However, the way they interpret probability and handle uncertainty is fundamentally different...

Multiple Comparisons 🇺🇸

March 26, 2025

Category: Statistics Notes

When conducting multiple hypothesis tests simultaneously, the likelihood of committing at least one Type I error (falsely rejecting a true null hypothesis) increases. This increase is due to the problem known as the "multiple comparisons problem" or the "look-elsewhere effect". The methods to addres...

Probability Tree 🇺🇸

February 20, 2025

Category: Statistics Notes

Probability trees are a visual representation of all possible outcomes of a probabilistic experiment and the paths leading to these outcomes. They are especially helpful in understanding sequences of events, particularly when these events are conditional on previous outcomes...

Logistic Regression 🇺🇸

February 07, 2025

Category: Statistics Notes

Logistic regression is a statistical method used for modeling the probability of a binary outcome based on one or more predictor variables. It is widely used in various fields such as medicine, social sciences, and machine learning for classification problems where the dependent variable is dichotom...

Arima Models 🇺🇸

October 08, 2024

Category: Statistics Notes

ARMA, ARIMA, and SARIMA are models commonly used to analyze and forecast time series data. ARMA (AutoRegressive Moving Average) combines two ideas: using past values to predict current ones (autoregression) and smoothing out noise using past forecast errors (moving average). ARIMA (AutoRegressive In...

Descriptive Statistics 🇺🇸

May 19, 2024

Category: Statistics Notes

Descriptive statistics offer a summary of the main characteristics of a dataset or sample. They facilitate the understanding and interpretation of data by providing measures of central tendency, dispersion, and shape. In this section, we will discuss the essential concepts and measures in descriptiv...

Backward Shift Operator 🇺🇸

January 24, 2024

Category: Statistics Notes

The backward shift operator (denoted by $B$) is a powerful tool in time series analysis, used to simplify the notation and manipulation of time series models. The operator shifts the time index of a time series back by one period, making it useful in autoregressive, moving average, and mixed models...

Resampling 🇺🇸

January 21, 2024

Category: Statistics Notes

Statistical inference often involves estimating population parameters and constructing confidence intervals based on sample data. Traditional methods rely on assumptions about the sampling distribution of estimators, such as normality and known standard errors. However, these assumptions may not hol...

Covariance 🇺🇸

December 23, 2023

Category: Statistics Notes

Covariance is a fundamental statistical measure that quantifies the degree to which two random variables change together. It indicates the direction of the linear relationship between variables...

Analysis of Categorical Data 🇺🇸

October 10, 2023

Category: Statistics Notes

The chi-square ($\chi^2$) test is a statistical method used to determine if there is a significant difference between expected and observed frequencies in one or more categories. It helps assess whether any observed deviations could be due to chance...

Seasonality and Trends 🇺🇸

October 06, 2023

Category: Statistics Notes

Seasonality and trends are fundamental components in time series data that significantly impact analysis and forecasting. Understanding and correctly modeling these elements are useful for accurate predictions and effective time series modeling...

Exponential Distribution 🇺🇸

September 27, 2023

Category: Statistics Notes

The exponential distribution is a continuous probability distribution that models the time between events in a Poisson point process. The exponential distribution is denoted as $X \sim \text{Exp}(\lambda)$, where $\lambda$ is the rate parameter...

Uniform Distribution 🇺🇸

August 21, 2023

Category: Statistics Notes

A continuous random variable X follows a uniform distribution over an interval [a, b] if it has a constant probability density over that interval. The uniform distribution is denoted as $X \sim \text{Uniform}(a, b)$...

Chi Square Distribution 🇺🇸

August 05, 2023

Category: Statistics Notes

A chi-square distribution is a continuous probability distribution of the sum of the squares of k independent standard normal random variables. The chi-square distribution is denoted as $X \sim \chi^2(k)$, where k is the number of degrees of freedom...

F Distribution 🇺🇸

July 30, 2023

Category: Statistics Notes

The F-distribution, also known as the Fisher-Snedecor distribution, is a continuous probability distribution that arises in hypothesis testing when comparing the variances of two normally distributed populations. The F-distribution is denoted as $X \sim F(d_1, d_2)$, where $d_1$ and $d_2$ are the de...

Gamma Distribution 🇺🇸

April 21, 2023

Category: Statistics Notes

A continuous random variable X follows a gamma distribution if it is used to model the time until an event occurs a specific number of times. The gamma distribution is a two-parameter family of continuous probability distributions and is often denoted as $X \sim \text{Gamma}(\alpha, \beta)$, where ...

Multiple Regression 🇺🇸

March 08, 2023

Category: Statistics Notes

Multiple linear regression is a statistical technique used to model the relationship between a single dependent variable and two or more independent variables. It extends the concept of simple linear regression by incorporating multiple predictors to explain the variability in the dependent variable...

Autoregressive Models 🇺🇸

February 12, 2023

Category: Statistics Notes

Autoregressive (AR) models are fundamental tools in time series analysis, used to describe and forecast time-dependent data. An AR model predicts future values based on a linear combination of past observations. The order of an AR model, denoted as $p$, indicates how many lagged past values are used...

Total Probability 🇺🇸

December 29, 2022

Category: Statistics Notes

The law of total probability allows for the computation of the probability of an event A based on a set of mutually exclusive and exhaustive events. It's particularly useful when the overall sample space is divided into several distinct scenarios, or partitions, that cover all possible outcomes. The...

Stationarity 🇺🇸

December 26, 2022

Category: Statistics Notes

Stationarity is an important idea in time series analysis. A time series is considered stationary if its statistical properties—like the mean, variance, and autocovariance—stay constant over time. This matters because methods like ARIMA and ARMA are designed to work with stationary data, so it’s a g...

Autocorrelation Function 🇺🇸

December 17, 2022

Category: Statistics Notes

In time series analysis, understanding the relationships between observations at different time lags is crucial for model identification and forecasting. Two essential tools for analyzing these relationships are the Autocorrelation Function (ACF) and the Partial Autocorrelation Function (PACF)...

Normal Distribution 🇺🇸

December 02, 2022

Category: Statistics Notes

A continuous random variable X follows a normal distribution, denoted as $X \sim \mathcal{N}(\mu,\,\sigma^{2})$. The normal distribution is characterized by its bell shape and symmetry. The majority of the values are concentrated around the mean, and there are no extreme values. It can be viewed as ...

Analysis of Variance 🇺🇸

April 30, 2022

Category: Statistics Notes

Does peer assessment enhance student learning...

Bayes Theorem 🇺🇸

February 12, 2022

Category: Statistics Notes

Bayes' theorem provides a way to update our probability estimates for an event based on new evidence. It connects the conditional and marginal probabilities of events, allowing us to revise our predictions or hypotheses in light of additional information. The theorem is stated mathematically as...

Introduction to Statistics 🇺🇸

January 13, 2022

Category: Statistics Notes

Statistics is an empirical science, focusing on data-driven insights for real-world applications. This guide offers a concise exploration of statistical fundamentals, aimed at providing practical knowledge for data analysis and interpretation...

Binomial Distribution 🇺🇸

November 20, 2021

Category: Statistics Notes

A discrete random variable X follows a binomial distribution if it represents the number of successes in a fixed number of Bernoulli trials with the same probability of success. The binomial distribution is denoted as $X \sim \text{Binomial}(n, p)$, where n is the number of trials and p is the proba...

Correlation 🇺🇸

August 05, 2021

Category: Statistics Notes

Correlation is a statistical measure that quantifies the strength and direction of the linear relationship between two variables. It is a fundamental concept in statistics, enabling researchers and analysts to understand how one variable may predict or relate to another. The most commonly used corre...

Series 🇺🇸

June 16, 2021

Category: Statistics Notes

A sequence is an ordered list of numbers that can be viewed as a function mapping each natural number $n$ to a specific value $a_n$. More formally, a sequence ${a_n}$ is a function whose domain is the set of natural numbers, and the values are called the terms of the sequence...

Log Normal Distribution 🇺🇸

June 14, 2021

Category: Statistics Notes

A continuous random variable X follows a log-normal distribution if its natural logarithm is normally distributed. The log-normal distribution is useful in modeling continuous random variables that are constrained to be positive. It is denoted as $X \sim \text{LogNormal}(\mu, \sigma^2)$, where $\mu...

Geometric Probability 🇺🇸

June 10, 2021

Category: Statistics Notes

Geometric probability is a fascinating branch of probability theory where outcomes are associated with geometric figures and their measures—such as lengths, areas, and volumes—rather than discrete numerical outcomes. It often deals with continuous random variables and employs integral calculus to ca...

Autocovariance Function 🇺🇸

May 29, 2021

Category: Statistics Notes

Autocovariance functions describe how values of a time series relate to their lagged counterparts, measuring the joint variability between a series at time $t$ and its value at a previous time $t-k$ (where $k$ is the lag). In autoregressive models, these relationships are expressed through coefficie...

Hypothesis Testing 🇺🇸

January 25, 2021

Category: Statistics Notes

Hypothesis testing is a tool in statistics that drives much of scientific research. It lets us draw conclusions about entire populations based on the information we collect from samples. You'll find it applied in many areas—from evaluating how well a new drug works in clinical trials to unraveling t...

Forecasting 🇺🇸

December 04, 2020

Category: Statistics Notes

Time series forecasting is a technique used to predict future values based on historical data. It is widely used in various fields, such as finance, economics, and meteorology. In this section, we will discuss the basics of time series forecasting...

Confidence Intervals 🇺🇸

October 31, 2020

Category: Statistics Notes

Confidence intervals (CIs) provide a range of values which are believed, with a certain degree of confidence, to contain a population parameter, like the mean or proportion. They are constructed from a sampled data set and offer an interval estimate for the parameter of interest...

Null Hypothesis 🇺🇸

May 18, 2020

Category: Statistics Notes

Statistical hypothesis testing is a fundamental method used in research to make inferences about populations based on sample data. Understanding the concepts of null and alternative hypotheses, as well as how to calculate and interpret p-values, is crucial for conducting robust and meaningful analys...

Standard Error and Lln 🇺🇸

April 19, 2020

Category: Statistics Notes

Expected Value (E), also known as the mean, is the long-run average of a random variable, representing the value we anticipate on average from repeated random draws from a population...

Central Limit Theorem 🇺🇸

March 26, 2020

Category: Statistics Notes

The Central Limit Theorem (CLT) is a fundamental concept in statistics, explaining why the distribution of sample means approximates a normal distribution, often known as the bell curve, as the sample size becomes larger, irrespective of the population's original distribution...

Geometric Distribution 🇺🇸

March 16, 2020

Category: Statistics Notes

A discrete random variable X follows a geometric distribution if it represents the number of trials needed to get the first success in a sequence of Bernoulli trials. The geometric distribution is denoted as $X \sim \text{Geometric}(p)$, where p is the probability of success on each trial...

Conditional Probability 🇺🇸

October 09, 2019

Category: Statistics Notes

Conditional Probability is the likelihood of an event occurring given that another event has already occurred. It is denoted as $P(A|B)$, representing the probability of event $A$ happening, assuming event $B$ has already taken place. This concept is crucial in understanding dependent events in prob...

Time Series 🇺🇸

August 12, 2019

Category: Statistics Notes

Time series data consists of sequential observations collected over a period of time. This kind of data is prevalent in a range of fields such as finance, economics, climatology, and more. Time series analysis involves the exploration of this data to identify inherent structures such as patterns or ...

Moving Average Models 🇺🇸

August 08, 2019

Category: Statistics Notes

Moving Average (MA) models are a fundamental class of univariate time series models used for forecasting and understanding temporal data. Unlike Autoregressive (AR) models, which rely on past values of the series itself, MA models utilize past forecast errors to model the current value of the series...

Student T Distribution 🇺🇸

July 08, 2019

Category: Statistics Notes

The Student's t-distribution, or simply t-distribution, is a continuous probability distribution that arises when estimating the mean of a normally distributed population in situations where the sample size is small and the population standard deviation is unknown. The t-distribution is denoted as ...

Introduction to Probability 🇺🇸

June 18, 2019

Category: Statistics Notes

Probability theory offers a structured approach to assessing the probability of events, allowing for logical and systematic reasoning about their likelihood...

Random Walk 🇺🇸

June 15, 2019

Category: Statistics Notes

The random walk is a fundamental and widely used time series model, often applied in finance to represent stock prices and other economic indicators. The idea behind the random walk is that the value of the process at time $t$ is the sum of its value at time $t-1$ and a random shock (or noise). Esse...

Invertibility 🇺🇸

May 04, 2019

Category: Statistics Notes

In time series modeling, invertibility is the property of a model that allows the innovation process (also called the noise or disturbance process) to be expressed as a function of the observed series and its past values. This is particularly relevant for Moving Average (MA) models...

Negative Binomial Distribution 🇺🇸

April 27, 2019

Category: Statistics Notes

A discrete random variable X follows a negative binomial distribution if it represents the number of trials required to achieve a specified number of successes in a sequence of independent Bernoulli trials. The negative binomial distribution is often denoted as $X \sim \text{NegBinomial}(r, p)$, whe...

Type i and Type Ii Errors 🇺🇸

January 16, 2019

Category: Statistics Notes

Hypothesis testing is a core concept in statistics that allows researchers to evaluate assumptions about a population by examining sample data. In this process, we start with a null hypothesis, denoted by $H_0$, which represents a baseline or default position, and an alternative hypothesis, $H_a$, w...

Time Series Modeling 🇺🇸

January 10, 2019

Category: Statistics Notes

Time series modeling involves analyzing data points collected or recorded at specific time intervals to understand underlying structures and make forecasts. Various models, such as Autoregressive (AR), Moving Average (MA), and their combinations (ARMA, ARIMA), are employed to capture different aspec...

Difference Equations 🇺🇸

November 06, 2018

Category: Statistics Notes

A difference equation (also known as a recurrence relation) defines each term of a sequence based on previous terms. In some cases, the general term of a sequence is given explicitly (e.g., $a_n = 3n + 2$, resulting in the sequence $5, 8, 11, \dots$). However, more commonly, a difference equation pr...

Statistical Moments and Time Series 🇺🇸

August 08, 2018

Category: Statistics Notes

Understanding the behavior of time series data is helpful in various fields such as finance, economics, and engineering. Statistical moments, particularly the mean and standard deviation, play an important role in characterizing these processes. This section delves into how these moments describe ti...

Normal Curve and z Score 🇺🇸

August 05, 2018

Category: Statistics Notes

A normal distribution (often referred to as the normal curve or Gaussian distribution) is a continuous probability distribution that is symmetric about the mean, where most of the observations cluster around the central peak and taper off symmetrically towards both ends. Many real-world datasets suc...

Poisson Distribution 🇺🇸

July 04, 2018

Category: Statistics Notes

A discrete random variable X follows a Poisson distribution if the events occur independently and at a constant average rate. The Poisson distribution is denoted as $X \sim \text{Poisson}(\lambda)$, where $\lambda$ is the average rate (or mean) of events occurring in a given interval...

Metrics 🇺🇸

May 18, 2018

Category: Statistics Notes

Evaluation metrics are essential tools for assessing the performance of statistical and machine learning models. They provide quantitative measures that help us understand how well a model is performing and where improvements can be made. In both classification and regression tasks, selecting approp...

Axioms of Probability 🇺🇸

May 14, 2018

Category: Statistics Notes

Probability theory is based on a set of principles, or axioms, that define the properties of the probability measure. These axioms, first formalized by the Russian mathematician Andrey Kolmogorov, are the foundation upon which the entire framework of probability is built...

Beta Distribution 🇺🇸

April 17, 2018

Category: Statistics Notes

A continuous random variable X follows a beta distribution if it is used to model the behavior of random variables that are constrained to intervals of finite length, often [0,1]. The beta distribution is characterized by two shape parameters, $\alpha$ and $\beta$, and is denoted as $X \sim \text{Be...