Introduction

In Topic 6.4, we learned to set up hypothesis tests by stating null and alternative hypotheses and checking conditions. Now comes the heart of hypothesis testing: the p-value.

The p-value is perhaps the most important—and most misunderstood—concept in all of statistical inference. It's the number that determines whether we reject the null hypothesis or not. It's reported in virtually every research paper, clinical trial, and scientific study. Yet it's frequently misinterpreted, even by professionals.

Understanding p-values correctly is essential not just for AP Statistics, but for being a critical consumer of research and data. This topic will clarify what p-values actually mean, how to find them, and most importantly, how to interpret them properly.

The Null Distribution

Before we can understand p-values, we need to understand the null distribution.

What is the Null Distribution?

The null distribution is the probability distribution of the test statistic assuming the null hypothesis is true.

Think of it this way: If $H_{0}$ were actually true, what values would we expect to see for our test statistic? The null distribution answers this question.

Creating the Null Distribution

There are two main ways to obtain a null distribution:

Theoretical approach: Use a probability model (like the standard normal distribution) based on mathematical theory
Simulation approach: Generate many samples assuming $H_{0}$ is true and observe the distribution of the test statistic

For the one-sample z-test for a proportion, we use the theoretical approach. The null distribution is approximately a standard normal distribution (z-distribution) when conditions are met.

Why It's Called the "Null" Distribution

We call it the "null" distribution because it describes what happens if the null hypothesis is true. Every calculation we do assumes $H_{0}$ is correct—that's the starting point for hypothesis testing.

What is a p-Value?

The p-value is the probability of obtaining a test statistic as extreme or more extreme than what we actually observed, given that the null hypothesis is true.

Breaking Down the Definition

Let's unpack each part:

"Probability of obtaining...": The p-value is a probability (between $0$ and $1$ )

"...a test statistic as extreme or more extreme...": "Extreme" means far from what we'd expect if $H_{0}$ were true

"...than what we actually observed...": We compare to our actual sample result

"...given that the null hypothesis is true": This is crucial—the entire calculation assumes $H_{0}$ is correct

The Conditional Nature of p-Values

The p-value is a conditional probability. We're asking: "If the null hypothesis were true, what's the probability of seeing data like ours (or more extreme)?"

What a p-value IS: $P (data as extreme or more extreme ∣ H_{0} is true)$

What a p-value is NOT: $P (H_{0} is true ∣ data)$

This distinction is critical. The p-value does NOT tell us the probability that the null hypothesis is true!

An Analogy

Imagine a friend claims they can predict coin flips. They predict $8$ out of $10$ flips correctly.

The p-value answers: "If they were just guessing randomly (like the null hypothesis), what's the probability they'd get $8$ or more correct just by chance?"

The p-value does NOT answer: "What's the probability they were guessing?"

Finding p-Values for Different Alternative Hypotheses

How we calculate the p-value depends on the direction of the alternative hypothesis. Let's say our test statistic is $z = x$ (some calculated value).

One-Sided Alternative: Greater Than Hₐ: p > p₀

When the alternative hypothesis is "greater than," we want to know how often we'd see a test statistic at or above what we observed.

p-value = $P (z \geq x)$

This is the area in the right tail of the standard normal distribution.

Example:

$H_{a} : p > 0.30$
Test statistic: $z = 2.15$
p-value = $P (z \geq 2.15) = 0.0158$

Interpretation: If the true proportion were $0.30$ , there's about a $1.58%$ chance of getting a test statistic of $2.15$ or higher.

One-Sided Alternative: Less Than Hₐ: p < p₀

When the alternative hypothesis is "less than," we want to know how often we'd see a test statistic at or below what we observed.

p-value = $P (z \leq x)$

This is the area in the left tail of the standard normal distribution.

Example:

$H_{a} : p < 0.50$
Test statistic: $z = - 1.83$
p-value = $P (z \leq - 1.83) = 0.0336$

Interpretation: If the true proportion were $0.50$ , there's about a $3.36%$ chance of getting a test statistic of $- 1.83$ or lower.

Two-Sided Alternative: Not Equal To Hₐ: p ≠ p₀

When the alternative hypothesis is "not equal to," we care about extreme values in either direction. We want test statistics that are far from $0$ in either the positive or negative direction.

p-value = $P (z \leq - ∣ x ∣) + P (z \geq ∣ x ∣)$

This is the area in both tails of the standard normal distribution.

Equivalently: p-value = $2 \cdot P (z \geq ∣ x ∣)$ (by symmetry of the standard normal distribution)

Example:

$H_{a} : p \neq = 0.60$
Test statistic: $z = - 2.31$
p-value = $P (z \leq - 2.31) + P (z \geq 2.31) = 0.0104 + 0.0104 = 0.0208$

Interpretation: If the true proportion were $0.60$ , there's about a $2.08%$ chance of getting a test statistic at least as far from $0$ as $- 2.31$ (in either direction).

Visual Understanding

For a two-sided test, we look at both tails because we're testing for a difference in either direction. If our test statistic is $z = 2.5$ , we also consider $z = - 2.5$ equally extreme, so we include both tail areas.

Finding p-Values Using Technology or Tables

In practice, we use technology (calculators, software) or standard normal tables to find p-values.

Using a Standard Normal Table

Standard normal tables typically give $P (z < x)$ (area to the left).

For $H_{a} : p < p_{0}$ :

Look up the test statistic directly
The table value is the p-value

For $H_{a} : p > p_{0}$ :

Look up the test statistic
p-value = $1 - table value$

For $H_{a} : p \neq = p_{0}$ :

Look up $∣ z ∣$
p-value = $2 (1 - table value)$

Using Technology

Most calculators and statistical software have built-in functions:

They calculate the test statistic automatically
They provide the p-value directly
They account for the direction of the alternative hypothesis

Important: Always verify that your technology is using the correct alternative hypothesis direction!

Interpreting p-Values in Context

A proper interpretation of a p-value must include several components:

The Complete Interpretation Template

"The p-value of [value] is the probability of obtaining a sample proportion of [observed value] or [more/less extreme in direction of $H_{a}$ ], assuming that the true population proportion [state $H_{0}$ in context]."

Example 1: One-Sided (Greater)

Context: Testing if more than $40$ % of voters support a proposal

$H_{0} : p = 0.40$
$H_{a} : p > 0.40$
Sample: $\overset{p}{^} = 0.47$
p-value = $0.023$

Good interpretation: "The p-value of $0.023$ is the probability of obtaining a sample proportion of $0.47$ or greater, assuming that the true proportion of all voters who support the proposal is $0.40$ ."

Example 2: One-Sided (Less)

Context: Testing if less than 20% of items are defective

$H_{0} : p = 0.20$
$H_{a} : p < 0.20$
Sample: $\overset{p}{^} = 0.14$
p-value = $0.087$

Good interpretation: "The p-value of $0.087$ is the probability of obtaining a sample proportion of $0.14$ or less, assuming that the true proportion of all items that are defective is $0.20$ ."

Example 3: Two-Sided

Context: Testing if the proportion has changed from 0.65

$H_{0} : p = 0.65$
$H_{a} : p \neq = 0.65$
Sample: $\overset{p}{^} = 0.58$
p-value = $0.042$

Good interpretation: "The p-value of $0.042$ is the probability of obtaining a sample proportion as far or farther from $0.65$ than $0.58$ (in either direction), assuming that the true proportion is $0.65$ ."

What to Include

Every p-value interpretation should include:

The p-value itself (the number)
What was observed (the sample statistic)
The direction of extreme (based on $H_{a}$ )
The assumption ("assuming $H_{0}$ is true" in context)

What NOT to Say

❌ "The probability that the null hypothesis is true is $0.023$ "

❌ "There's a $2.3$ % chance that $p = 0.40$ "

❌ "The probability the alternative is true is $0.977$ "

❌ "The null hypothesis is false with probability $0.977$ "

Small p-Values: Evidence for the Alternative

Small p-values indicate that the observed test statistic would be unusual if the null hypothesis were true.

The Logic

If we observe something that would be very rare under $H_{0}$ , we have two options:

$H_{0}$ is true, and we just witnessed a rare event
$H_{0}$ is false, which explains why we saw what we saw

Small p-values suggest option 2 is more reasonable.

What "Small" Means

The smaller the p-value, the more convincing the statistical evidence for the alternative hypothesis.

Common benchmarks (significance levels):

$p < 0.01$ : Very strong evidence against $H_{0}$
$p < 0.05$ : Strong evidence against $H_{0}$
$p < 0.10$ : Moderate evidence against $H_{0}$

Interpreting Small p-Values

When the p-value is small:

The observed data would be unusual if $H_{0}$ were true
We have convincing statistical evidence for $H_{a}$
The data are not consistent with $H_{0}$
We typically reject $H_{0}$ (formal decision in Topic 6.6)

Example: p-value = $0.008$ "A sample result this extreme would occur only about $0.8%$ of the time if the null hypothesis were true. This provides strong evidence for the alternative hypothesis."

Large p-Values: Lack of Evidence

p-values that are not small indicate that the observed test statistic would not be unusual if the null hypothesis were true.

The Logic

If we observe something that's reasonably likely under $H_{0}$ , we don't have reason to doubt $H_{0}$ .

Important: This does NOT mean $H_{0}$ is true! It means our data are consistent with $H_{0}$ .

What Large p-Values Tell Us

When the p-value is large:

The observed data would be reasonably likely if $H_{0}$ were true
We do not have convincing statistical evidence for $H_{a}$
The data are consistent with $H_{0}$
We typically fail to reject $H_{0}$ (formal decision in Topic 6.6)

What Large p-Values Do NOT Tell Us

❌ Large p-values do not provide evidence that $H_{0}$ is true

❌ Large p-values do not prove $H_{0}$

❌ Large p-values do not mean $H_{a}$ is false

The key principle: Lack of evidence against $H_{0}$ is not the same as evidence for $H_{0}$ .

An Analogy

If a jury finds a defendant "not guilty," that doesn't mean the defendant is innocent. It means there wasn't sufficient evidence for conviction. Similarly, failing to reject $H_{0}$ means we lack sufficient evidence against it—not that it's true.

Example: p-value = $0.34$ "A sample result this extreme would occur about $34%$ of the time if the null hypothesis were true. This does not provide convincing evidence for the alternative hypothesis, though it doesn't prove the null hypothesis is true either."

Common Misconceptions About p-Values

Misconception 1: The p-value is the probability H₀ is true

WRONG: "The p-value of 0.05 means there's a 5% chance the null hypothesis is true."

CORRECT: The p-value is the probability of observing data like ours if $H_{0}$ is true. It's not the probability that $H_{0}$ itself is true.

Misconception 2: Large p-values prove H₀

WRONG: "The p-value is $0.62$ , so the null hypothesis is true."

CORRECT: A large p-value means the data are consistent with $H_{0}$ , but many other parameter values might also be consistent with the data.

Misconception 3: The p-value is the probability of making an error

WRONG: "A p-value of $0.03$ means there's a $3%$ chance we're making a mistake."

CORRECT: The p-value is about the probability of observing our data given $H_{0}$ , not about the probability of making an error in our decision. (Type I error probability is the significance level $α$ , which we'll discuss in Topic 6.6.)

Misconception 4: p = 0.051 and p = 0.049 are drastically different

WRONG: " $p = 0.049$ is significant but $p = 0.051$ is not, so they lead to completely different conclusions."

CORRECT: p-values near the significance level are borderline cases. The difference between $0.049$ and $0.051$ is negligible—both suggest weak to moderate evidence. Don't treat the significance level as a rigid boundary.

The Relationship Between p-Values and Strength of Evidence

p-values exist on a continuum. Here's a rough guide:

$p-value Range p < 0.01 0.01 \leq p < 0.05 p \geq 0.10 Strength of Evidence Against H_{0} Very strong evidence Strong evidence Little to no evidence$

Remember: These are guidelines, not rigid rules. Context matters, and the formal decision-making process (Topic 6.6) uses a predetermined significance level.

Simulated Null Distributions

Essential Knowledge 3.A.1 mentions that if the null distribution has been simulated, the p-value is found as a proportion rather than a probability from a theoretical distribution.

How Simulation Works

Assume $H_{0}$ is true
Generate many random samples (e.g., 1000 or 10,000) under this assumption
Calculate the test statistic for each simulated sample
Create a distribution of these test statistics (the simulated null distribution)
Find where the observed test statistic falls in this distribution

Finding p-Values from Simulations

For $H_{a} : p > p_{0}$ : p-value = proportion of simulated statistics $\geq$ observed statistic

For $H_{a} : p < p_{0}$ : p-value = proportion of simulated statistics $\leq$ observed statistic

For $H_{a} : p \neq = p_{0}$ : p-value = proportion of simulated statistics $\geq ∣$ observed$|$ + proportion $\leq - ∣$ observed$|$

Example: We observe $z = 2.3$ with $H_{a} : p > 0.50$

Generate 10,000 samples assuming $p = 0.50$
Calculate test statistic for each
Count how many are $\geq 2.3$
Suppose 180 of the 10,000 are $\geq 2.3$
p-value = $\frac{180}{10000} = 0.018$

Putting It All Together

Understanding p-values is about understanding this logical chain:

We assume $H_{0}$ is true
Under this assumption, we know the null distribution
We calculate how unlikely our observed data would be
If very unlikely (small p-value), we have evidence against $H_{0}$
If reasonably likely (large p-value), we lack evidence against $H_{0}$

The p-value is the bridge between our sample data and our conclusion about the population.

FiveHive

FiveHive

6.5 - Interpreting p-Values

Introduction

The Null Distribution

What is the Null Distribution?

Creating the Null Distribution

Why It's Called the "Null" Distribution

What is a p-Value?

Breaking Down the Definition

The Conditional Nature of p-Values

An Analogy

Finding p-Values for Different Alternative Hypotheses

One-Sided Alternative: Greater Than Hₐ: p > p₀

One-Sided Alternative: Less Than Hₐ: p < p₀

Two-Sided Alternative: Not Equal To Hₐ: p ≠ p₀

Visual Understanding

Finding p-Values Using Technology or Tables

Using a Standard Normal Table

Using Technology

Interpreting p-Values in Context

The Complete Interpretation Template

Example 1: One-Sided (Greater)

Example 2: One-Sided (Less)

Example 3: Two-Sided

What to Include

What NOT to Say

Small p-Values: Evidence for the Alternative

The Logic

What "Small" Means

Interpreting Small p-Values

Large p-Values: Lack of Evidence

The Logic

What Large p-Values Tell Us

What Large p-Values Do NOT Tell Us

An Analogy

Common Misconceptions About p-Values

Misconception 1: The p-value is the probability H₀ is true

Misconception 2: Large p-values prove H₀

Misconception 3: The p-value is the probability of making an error

Misconception 4: p = 0.051 and p = 0.049 are drastically different

The Relationship Between p-Values and Strength of Evidence

Simulated Null Distributions

How Simulation Works

Finding p-Values from Simulations

Putting It All Together

Practice Problems