Last modified: March 11, 2025

This article is written in: 🇺🇸

Hypothesis Testing

Hypothesis testing is a tool in statistics that drives much of scientific research. It lets us draw conclusions about entire populations based on the information we collect from samples. You'll find it applied in many areas—from evaluating how well a new drug works in clinical trials to unraveling the mysteries of customer behavior in business analytics.

A hypothesis is a statement that might be true.

Inputs and Outputs of Hypothesis Testing

Inputs:

  1. The null hypothesis ($H_0$) represents the default assumption or the status quo, indicating no effect or no difference. Hypothesis testing aims to challenge this statement.
  2. The alternative hypothesis ($H_1$ or $H_a$) is the claim that the test seeks to support, indicating the presence of an effect or difference.
  3. The significance level ($\alpha$) is a pre-determined threshold, usually set at 0.05, which defines the risk of rejecting the null hypothesis when it is actually true (Type I error).
  4. Sample data refers to the collected data from observations, experiments, or surveys, providing the basis for calculating the test statistic and the p-value.

Output:

The p-value is the probability of observing data as extreme as, or more extreme than, the sample data, assuming the null hypothesis is true. A small p-value (typically ≤ $\alpha$) provides strong evidence against the null hypothesis.

Overview of Hypothesis Testing Steps

Hypothesis testing is a structured process involving several steps:

  1. The process begins by formulating hypotheses, where the null and alternative hypotheses are defined based on the research question.
  2. Next, a significance level ($\alpha$) is chosen, often set at 0.05, but it can be adjusted depending on the study's requirements or field norms.
  3. Data collection is conducted systematically to ensure the data is representative and free from bias.
  4. After collecting the data, the test statistic is calculated using an appropriate formula to convert the sample data into a value suitable for hypothesis testing.
  5. The p-value is then determined, representing the probability of obtaining the observed or more extreme test statistic under the null hypothesis.
  6. In decision making, $H_0$ is rejected if the p-value is less than $\alpha$; otherwise, you fail to reject $H_0$.
  7. Finally, interpreting results involves understanding the decision in the context of the research question. Failing to reject $H_0$ does not prove it is true, only that there isn’t strong evidence against it.

Example: Marble Bags

Imagine two bags: Bag A with a mix of 5 white and 5 black marbles, and Bag B with only black marbles.

Bag A            Bag B
  _____            _____
 / • •  \         / O O  \
|  • •  |        |  O O  |    O = White Marble
|  O O  |        |  O O  |    • = Black Marble
|  O O  |        |  O O  |    
|  • O  |        |  O O  |
 \_____/          \_____/

Suspecting you have Bag B, you decide to test this hypothesis:

Drawing n marbles and finding them all black leads to calculating p-values to test these hypotheses. For Bag A, the chance of drawing a black marble is 0.5. Hence, drawing n black marbles consecutively from Bag A has a probability of $(0.5)^n$.

A smaller p-value indicates stronger evidence against $H_0$. As n increases, the likelihood that you have Bag B (only black marbles) increases.

Null and Alternative Hypotheses for a Mean

When testing the population mean, hypothesis testing considers three possibilities, each with distinct null and alternative hypotheses:

Types of Tests

I. Left-Tailed Test

II. Right-Tailed Test

III. Two-Tailed Test

The null hypothesis always assumes that the population mean $\mu$ equals a predetermined value $\mu_0$. The alternative hypothesis presents a contrary statement: the population mean $\mu$ is less than, greater than, or not equal to $\mu_0$.

Important Note: Left-tailed and right-tailed tests are typically used when the effect is expected to occur in only one direction or when only one-directional effects are relevant. In most research scenarios, a two-tailed test is preferred unless there's strong justification for a one-tailed test.

Examples

I. Testing the Effectiveness of a New Diet (Two-Tailed Test)

II. Evaluating Customer Service Efficiency (Left-Tailed Test)

III. Assessing the Impact of a New Teaching Method (Right-Tailed Test)

The P-value

Once the data is collected and the sample statistic computed, the researcher computes the P-value.

The P-value is the probability of obtaining a measurement at least as extreme as the one we measured, under the assumption that the null hypothesis is true.

79292b56-d3c7-4eec-b30d-0c64a11d58ac

By at least as extreme, we mean a value at least as far to the left or right of the measured value.

Choosing the Right Statistical Test

Selecting a suitable statistical test is critical in hypothesis testing, and several factors determine the appropriate choice:

Factors to Consider

  1. The type of data (categorical, ordinal, interval, or ratio) significantly influences the choice of statistical test, as different types of data require different methods.
  2. The number of variables under analysis determines the test type, with specific tests designed for univariate (one variable), bivariate (two variables), and multivariate (more than two variables) analyses.
  3. The distribution of the data is crucial, as parametric tests assume a normal distribution. If the data does not meet this assumption, a non-parametric test is more appropriate.
  4. The study design, such as comparing groups or measuring changes over time within a group, also plays a role in selecting the correct statistical test.

Examples of Statistical Tests

The following table summarizes some common statistical tests and their applications:

Test Data Type Number of Groups Assumptions
T-Test Interval/Ratio Two Normally distributed, independent samples
Paired T-Test Interval/Ratio Two Normally distributed, dependent samples
One-way ANOVA Interval/Ratio More than Two Normally distributed, independent samples
Two-way ANOVA Interval/Ratio More than Two Normally distributed, independent samples
Chi-Square Test Categorical Two or more Independence between variables
Pearson Correlation Interval/Ratio Two Normally distributed, linear relationship
Spearman Correlation Ordinal Two Non-parametric, monotonic relationship
Mann-Whitney U Test Ordinal/Continuous Two Non-parametric, independent samples
Kruskal-Wallis H Test Ordinal/Continuous More than Two Non-parametric, independent samples
Wilcoxon Signed-Rank Test Ordinal/Continuous Two Non-parametric, dependent samples
Friedman Test Ordinal/Continuous More than Two Non-parametric, dependent samples

Example: Hypothesis Test for the Mean

An agronomist suggests that a new fertilizer increases the average yield of a particular crop to more than 2 tons per hectare. To test this claim, a study is conducted where the new fertilizer is applied to randomly selected plots. The yield of 25 plots is measured, resulting in a mean yield of 2.1 tons per hectare and a standard deviation of 0.3 tons per hectare. Is the new fertilizer effective at increasing the average yield at a significance level of $\alpha = 0.05$?

Hypothesis Setup:

Test Statistic:

For the test statistic, we use the one-sample z-test since the sample size is greater than 30:

$$z = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}}$$

where:

Plugging in the values:

$$z = \frac{2.1 - 2}{0.3/\sqrt{25}}$$

$$z = \frac{0.1}{0.06}$$

$$z \approx 1.667$$

We look up the critical z-value for a right-tailed test at $\alpha = 0.05$, which is approximately 1.645. Since our calculated z-value of 1.667 is greater than 1.645, we reject the null hypothesis.

There is sufficient evidence at the $\alpha = 0.05$ significance level to support the claim that the new fertilizer increases the average yield of the crop to more than 2 tons per hectare.

Table of Contents

  1. Hypothesis Testing
    1. Inputs and Outputs of Hypothesis Testing
    2. Overview of Hypothesis Testing Steps
    3. Example: Marble Bags
  2. Null and Alternative Hypotheses for a Mean
    1. Types of Tests
    2. Examples
  3. The P-value
  4. Choosing the Right Statistical Test
    1. Factors to Consider
    2. Examples of Statistical Tests
    3. Example: Hypothesis Test for the Mean