Introduction
In Topic 6.4, we learned to set up hypothesis tests by stating null and alternative hypotheses and checking conditions. Now comes the heart of hypothesis testing: the p-value.
The p-value is perhaps the most important—and most misunderstood—concept in all of statistical inference. It's the number that determines whether we reject the null hypothesis or not. It's reported in virtually every research paper, clinical trial, and scientific study. Yet it's frequently misinterpreted, even by professionals.
Understanding p-values correctly is essential not just for AP Statistics, but for being a critical consumer of research and data. This topic will clarify what p-values actually mean, how to find them, and most importantly, how to interpret them properly.
The Null Distribution
Before we can understand p-values, we need to understand the null distribution.
What is the Null Distribution?
The null distribution is the probability distribution of the test statistic assuming the null hypothesis is true.
Think of it this way: If were actually true, what values would we expect to see for our test statistic? The null distribution answers this question.
Creating the Null Distribution
There are two main ways to obtain a null distribution:
- Theoretical approach: Use a probability model (like the standard normal distribution) based on mathematical theory
- Simulation approach: Generate many samples assuming is true and observe the distribution of the test statistic
For the one-sample z-test for a proportion, we use the theoretical approach. The null distribution is approximately a standard normal distribution (z-distribution) when conditions are met.
Why It's Called the "Null" Distribution
We call it the "null" distribution because it describes what happens if the null hypothesis is true. Every calculation we do assumes is correct—that's the starting point for hypothesis testing.
What is a p-Value?
The p-value is the probability of obtaining a test statistic as extreme or more extreme than what we actually observed, given that the null hypothesis is true.
Breaking Down the Definition
Let's unpack each part:
"Probability of obtaining...": The p-value is a probability (between and )
"...a test statistic as extreme or more extreme...": "Extreme" means far from what we'd expect if were true
"...than what we actually observed...": We compare to our actual sample result
"...given that the null hypothesis is true": This is crucial—the entire calculation assumes is correct
The Conditional Nature of p-Values
The p-value is a conditional probability. We're asking: "If the null hypothesis were true, what's the probability of seeing data like ours (or more extreme)?"
What a p-value IS:
What a p-value is NOT:
This distinction is critical. The p-value does NOT tell us the probability that the null hypothesis is true!
An Analogy
Imagine a friend claims they can predict coin flips. They predict out of flips correctly.
The p-value answers: "If they were just guessing randomly (like the null hypothesis), what's the probability they'd get or more correct just by chance?"
The p-value does NOT answer: "What's the probability they were guessing?"
Finding p-Values for Different Alternative Hypotheses
How we calculate the p-value depends on the direction of the alternative hypothesis. Let's say our test statistic is (some calculated value).
One-Sided Alternative: Greater Than Hₐ: p > p₀
When the alternative hypothesis is "greater than," we want to know how often we'd see a test statistic at or above what we observed.
p-value =
This is the area in the right tail of the standard normal distribution.
Example:
- Test statistic:
- p-value =
Interpretation: If the true proportion were , there's about a chance of getting a test statistic of or higher.
One-Sided Alternative: Less Than Hₐ: p < p₀
When the alternative hypothesis is "less than," we want to know how often we'd see a test statistic at or below what we observed.
p-value =
This is the area in the left tail of the standard normal distribution.
Example:
- Test statistic:
- p-value =
Interpretation: If the true proportion were , there's about a chance of getting a test statistic of or lower.
Two-Sided Alternative: Not Equal To Hₐ: p ≠ p₀
When the alternative hypothesis is "not equal to," we care about extreme values in either direction. We want test statistics that are far from in either the positive or negative direction.
p-value =
This is the area in both tails of the standard normal distribution.
Equivalently: p-value = (by symmetry of the standard normal distribution)
Example:
- Test statistic:
- p-value =
Interpretation: If the true proportion were , there's about a chance of getting a test statistic at least as far from as (in either direction).
Visual Understanding
For a two-sided test, we look at both tails because we're testing for a difference in either direction. If our test statistic is , we also consider equally extreme, so we include both tail areas.
Finding p-Values Using Technology or Tables
In practice, we use technology (calculators, software) or standard normal tables to find p-values.
Using a Standard Normal Table
Standard normal tables typically give (area to the left).
For :
- Look up the test statistic directly
- The table value is the p-value
For :
- Look up the test statistic
- p-value =
For :
- Look up
- p-value =
Using Technology
Most calculators and statistical software have built-in functions:
- They calculate the test statistic automatically
- They provide the p-value directly
- They account for the direction of the alternative hypothesis
Important: Always verify that your technology is using the correct alternative hypothesis direction!
Interpreting p-Values in Context
A proper interpretation of a p-value must include several components:
The Complete Interpretation Template
"The p-value of [value] is the probability of obtaining a sample proportion of [observed value] or [more/less extreme in direction of ], assuming that the true population proportion [state in context]."
Example 1: One-Sided (Greater)
Context: Testing if more than % of voters support a proposal
- Sample:
- p-value =
Good interpretation: "The p-value of is the probability of obtaining a sample proportion of or greater, assuming that the true proportion of all voters who support the proposal is ."
Example 2: One-Sided (Less)
Context: Testing if less than 20% of items are defective
- Sample:
- p-value =
Good interpretation: "The p-value of is the probability of obtaining a sample proportion of or less, assuming that the true proportion of all items that are defective is ."
Example 3: Two-Sided
Context: Testing if the proportion has changed from 0.65
- Sample:
- p-value =
Good interpretation: "The p-value of is the probability of obtaining a sample proportion as far or farther from than (in either direction), assuming that the true proportion is ."
What to Include
Every p-value interpretation should include:
- The p-value itself (the number)
- What was observed (the sample statistic)
- The direction of extreme (based on )
- The assumption ("assuming is true" in context)
What NOT to Say
❌ "The probability that the null hypothesis is true is "
❌ "There's a % chance that "
❌ "The probability the alternative is true is "
❌ "The null hypothesis is false with probability "
Small p-Values: Evidence for the Alternative
Small p-values indicate that the observed test statistic would be unusual if the null hypothesis were true.
The Logic
If we observe something that would be very rare under , we have two options:
- is true, and we just witnessed a rare event
- is false, which explains why we saw what we saw
Small p-values suggest option 2 is more reasonable.
What "Small" Means
The smaller the p-value, the more convincing the statistical evidence for the alternative hypothesis.
Common benchmarks (significance levels):
- : Very strong evidence against
- : Strong evidence against
- : Moderate evidence against
Interpreting Small p-Values
When the p-value is small:
- The observed data would be unusual if were true
- We have convincing statistical evidence for
- The data are not consistent with
- We typically reject (formal decision in Topic 6.6)
Example: p-value = "A sample result this extreme would occur only about of the time if the null hypothesis were true. This provides strong evidence for the alternative hypothesis."
Large p-Values: Lack of Evidence
p-values that are not small indicate that the observed test statistic would not be unusual if the null hypothesis were true.
The Logic
If we observe something that's reasonably likely under , we don't have reason to doubt .
Important: This does NOT mean is true! It means our data are consistent with .
What Large p-Values Tell Us
When the p-value is large:
- The observed data would be reasonably likely if were true
- We do not have convincing statistical evidence for
- The data are consistent with
- We typically fail to reject (formal decision in Topic 6.6)
What Large p-Values Do NOT Tell Us
❌ Large p-values do not provide evidence that is true
❌ Large p-values do not prove
❌ Large p-values do not mean is false
The key principle: Lack of evidence against is not the same as evidence for .
An Analogy
If a jury finds a defendant "not guilty," that doesn't mean the defendant is innocent. It means there wasn't sufficient evidence for conviction. Similarly, failing to reject means we lack sufficient evidence against it—not that it's true.
Example: p-value = "A sample result this extreme would occur about of the time if the null hypothesis were true. This does not provide convincing evidence for the alternative hypothesis, though it doesn't prove the null hypothesis is true either."
Common Misconceptions About p-Values
Misconception 1: The p-value is the probability H₀ is true
WRONG: "The p-value of 0.05 means there's a 5% chance the null hypothesis is true."
CORRECT: The p-value is the probability of observing data like ours if is true. It's not the probability that itself is true.
Misconception 2: Large p-values prove H₀
WRONG: "The p-value is , so the null hypothesis is true."
CORRECT: A large p-value means the data are consistent with , but many other parameter values might also be consistent with the data.
Misconception 3: The p-value is the probability of making an error
WRONG: "A p-value of means there's a chance we're making a mistake."
CORRECT: The p-value is about the probability of observing our data given , not about the probability of making an error in our decision. (Type I error probability is the significance level , which we'll discuss in Topic 6.6.)
Misconception 4: p = 0.051 and p = 0.049 are drastically different
WRONG: " is significant but is not, so they lead to completely different conclusions."
CORRECT: p-values near the significance level are borderline cases. The difference between and is negligible—both suggest weak to moderate evidence. Don't treat the significance level as a rigid boundary.
The Relationship Between p-Values and Strength of Evidence
p-values exist on a continuum. Here's a rough guide:
Remember: These are guidelines, not rigid rules. Context matters, and the formal decision-making process (Topic 6.6) uses a predetermined significance level.
Simulated Null Distributions
Essential Knowledge 3.A.1 mentions that if the null distribution has been simulated, the p-value is found as a proportion rather than a probability from a theoretical distribution.
How Simulation Works
- Assume is true
- Generate many random samples (e.g., 1000 or 10,000) under this assumption
- Calculate the test statistic for each simulated sample
- Create a distribution of these test statistics (the simulated null distribution)
- Find where the observed test statistic falls in this distribution
Finding p-Values from Simulations
For : p-value = proportion of simulated statistics observed statistic
For : p-value = proportion of simulated statistics observed statistic
For : p-value = proportion of simulated statistics observed$|$ + proportion observed$|$
Example: We observe with
- Generate 10,000 samples assuming
- Calculate test statistic for each
- Count how many are
- Suppose 180 of the 10,000 are
- p-value =
Putting It All Together
Understanding p-values is about understanding this logical chain:
- We assume is true
- Under this assumption, we know the null distribution
- We calculate how unlikely our observed data would be
- If very unlikely (small p-value), we have evidence against
- If reasonably likely (large p-value), we lack evidence against
The p-value is the bridge between our sample data and our conclusion about the population.
