Introduction
We've learned to set up hypothesis tests (Topic 6.4) and understand p-values (Topic 6.5). Now it's time to carry out the complete hypothesis testing procedure and make formal decisions about population parameters.
Topic 6.6 is where hypothesis testing comes alive. This is where we calculate test statistics, find p-values, make formal decisions, and draw conclusions that answer real-world research questions. Understanding this topic means you can evaluate claims about populations, assess scientific studies, and make data-driven decisions.
This is one of the most important topics in the entire course. The hypothesis testing framework you learn here will be used throughout statistics and forms the backbone of scientific research, quality control, medical studies, and policy decisions.
The Test Statistic for a Population Proportion
The test statistic is a standardized value that measures how far our sample result is from what we'd expect if the null hypothesis were true.
The Formula
For testing a population proportion, the test statistic is:
Where:
- = sample proportion
- = null hypothesized value (the value in )
- = sample size
Understanding the Formula
This formula has a familiar structure that appears throughout statistics:
Numerator (: How far is our sample proportion from the null value?
- Large differences (positive or negative) suggest the null might be wrong
- Small differences are consistent with the null being true
Denominator : The standard error based on the null hypothesis.
- This measures the typical variation we'd expect in if were true
- Notice we use (not ) because we are assuming the null is true
Interpreting the Test Statistic
The z-statistic tells us how many standard errors our sample proportion is from the null hypothesized value.
- : Sample proportion equals null value exactly
- : Sample proportion is standard errors above the null value
- : Sample proportion is standard errors below the null value
Large absolute values of z (far from ) suggest the null hypothesis is questionable.
Example Calculation
A company claims of customers use their mobile app (). In a random sample of customers, use the app.
Step 1: Calculate sample proportion
Step 2: Calculate test statistic
Interpretation: The sample proportion is standard errors below the claimed proportion of .
The Null Distribution and Standard Normal
When the null hypothesis is true and all conditions are met, the test statistic follows a standard normal distribution(mean = , standard deviation = ).
Why the Standard Normal Distribution?
Remember the sampling distribution of $\hat{p}$ is approximately normal when conditions are met:
- Mean:
- Standard deviation:
When we standardize by subtracting the mean and dividing by the standard deviation, we get the standard normal distribution (z-distribution).
Using the Null Distribution
The null distribution (standard normal) allows us to:
- Calculate probabilities (p-values)
- Determine how unusual our test statistic is
- Make decisions about the null hypothesis
This is why we use standard normal tables or technology—they give us areas under the standard normal curve, which are our p-values.
Finding p-Values from the Standard Normal Distribution
Once we have the test statistic, we find the p-value using the standard normal distribution. The method depends on the alternative hypothesis.
Review: The Three Cases
For (one-sided, right tail):
For (one-sided, left tail):
For (two-sided):
Using Technology
Most calculators and statistical software provide p-values directly:
- Input the data or summary statistics
- Specify the null hypothesized value
- Specify the alternative hypothesis direction
- The software calculates both z and the p-value
Common calculator functions:
- TI-84: 1-PropZTest
- Output includes test statistic (z) and p-value
Using Tables
Standard normal tables give (area to the left).
Example 1 (right-tail): ,
- Table gives
- p-value =
Example 2 (left-tail): ,
- Table gives
- p-value =
Example 3 (two-tail): ,
- Area in right tail =
- p-value =
The Significance Level
The significance level, denoted by , is the predetermined probability of rejecting the null hypothesis given that it is true.
Understanding Significance Level
The significance level is the threshold we set before conducting the test for what counts as "convincing evidence."
Common values:
- (): Most common in practice
- (): More stringent, used when false positives are costly
- (): More lenient, used in exploratory research
Setting the Significance Level
The significance level should be determined before collecting or analyzing data. It represents how much risk of a false positive (Type I error) we're willing to accept.
Factors influencing choice of :
- Consequences of incorrectly rejecting (more serious = smaller )
- Field conventions (medical research often uses ; social sciences often use )
- Balance between Type I and Type II errors
The Significance Level vs. p-Value
These are often confused but are fundamentally different:
Significance level ():
- Predetermined threshold
- Set before data collection
- Reflects how much evidence we require
p-value:
- Calculated from data
- Measures the actual evidence against
- Compared to to make a decision
Making the Formal Decision
The relationship between the p-value and the significance level determines whether a result is statistically significant.
The Decision Rule
If p-value : Reject
If p-value : Fail to reject
What "Reject " Means
When we reject the null hypothesis:
- There is convincing statistical evidence to support the alternative hypothesis
- The observed data would be very unusual if were true
- We conclude the population parameter is likely different from the null value
Example: , , p-value = ,
- Since , we reject
- We have convincing evidence that
What "Fail to Reject " Means
When we fail to reject the null hypothesis:
- There is not convincing statistical evidence to support the alternative hypothesis
- The observed data are reasonably consistent with
- We do not have sufficient evidence to conclude the parameter differs from the null value
Example: , , p-value = ,
- Since , we fail to reject
- We do not have convincing evidence that differs from
Critical Distinction: We Never "Accept" or "Prove"
IMPORTANT: A hypothesis test can lead to:
- Rejecting the null hypothesis
- Failing to reject the null hypothesis
A hypothesis test CANNOT lead to:
- "Accepting" the null hypothesis
- "Proving" the null hypothesis is true
- "Concluding" the null hypothesis is correct
Why We Don't Accept
Failing to reject means our data are consistent with , but that doesn't mean is true. Many other values might also be consistent with our data—we simply lack evidence to distinguish between them.
Analogy: In court, "not guilty" doesn't mean "innocent"—it means insufficient evidence for conviction. Similarly, failing to reject means insufficient evidence against it, not proof it's true.
The principle: Lack of evidence against is not the same as evidence for .
Writing Conclusions in Context
A proper conclusion must be stated in context with several key components.
The Template for Conclusions
If we reject : "Since the p-value [value] is less than [value], we reject . We have convincing statistical evidence that [state in context of the problem]."
If we fail to reject : "Since the p-value [value] is greater than [value], we fail to reject . We do not have convincing statistical evidence that [state in context of the problem]."
Required Components
- Explicit comparison: State whether p-value or
- Formal decision: "Reject " or "Fail to reject "
- Evidence statement: "We have/do not have convincing statistical evidence"
- Alternative hypothesis in context: State what you're testing for, using the actual context
- Parameter and population: Reference the population proportion and population
Using Non-Definitive Language
Conclusions should use tentative, non-definitive language:
Good words to use:
- "suggests," "indicates," "provides evidence"
- "convincing statistical evidence"
- "appears to be," "seems to be"
Avoid definitive language:
- "proves," "establishes with certainty"
- "definitely is," "absolutely"
- "must be," "has to be"
Examples of Good Conclusions
Example 1: Testing if proportion of defective items exceeds 0.05
- ,
- p-value = ,
Good conclusion: "Since the p-value of is less than the significance level of , we reject . We have convincing statistical evidence that the true proportion of defective items produced by this machine exceeds ."
Example 2: Testing if customer satisfaction has changed from 0.72
- ,
- p-value = ,
Good conclusion: "Since the p-value of is greater than the significance level of , we fail to reject . We do not have convincing statistical evidence that the true proportion of satisfied customers differs from ."
Common Mistakes in Conclusions
Incorrect: "We accept the null hypothesis." Correct: "We fail to reject the null hypothesis."
Incorrect: "We proved that the proportion is greater than 0.40." Correct: "We have convincing evidence that the proportion is greater than 0.40."
Incorrect: "The null hypothesis is true." Correct: "We do not have convincing evidence against the null hypothesis."
Incorrect: "The result is significant." (without context) Correct: "We have convincing statistical evidence that the true proportion of voters who support the measure exceeds 0.50."
Connecting to the Investigative Question
The hypothesis test serves as statistical reasoning to support the answer to a research question.
From Statistics to Science
The hypothesis test provides the statistical justification, which then informs the practical or scientific conclusion.
Example investigative question: "Has the company's new customer service training improved satisfaction rates?"
Statistical work:
- Collect data on current satisfaction
- Test (old rate) vs.
- Get p-value =
- Reject at
Answer to investigative question: "Yes, there is convincing statistical evidence that the customer satisfaction rate has increased from the previous rate of . The data suggest the training program was effective."
The Bridge from Data to Decision
The hypothesis test tells us:
- Whether the data provide evidence (decision to reject or not)
- How strong the evidence is (p-value)
- Direction of the effect (from )
This statistical evidence then informs practical decisions:
- Should we implement the new policy?
- Should we continue using this supplier?
- Should we recommend the treatment?
Complete Example: Full Hypothesis Test
Scenario: A pharmaceutical company claims that of patients experience symptom relief with their medication. A consumer advocacy group suspects the true rate is lower. They conduct a study with a random sample of patients from thousands who have used the medication. Of these patients, experienced symptom relief. Test the advocacy group's suspicion at the significance level.
Step 1: State the parameter and hypotheses
Parameter: Let = the true proportion of all patients who experience symptom relief with this medication
Hypotheses:
- (one-sided because we suspect the rate is lower)
Step 2: Check conditions
Randomization: The problem states a random sample was selected from all patients who have used the medication. Condition is satisfied.
10% condition: The sample of is less than of thousands of patients. is satisfied.
Large counts condition: ✓ ✓
All conditions are satisfied for a one-sample z-test for a population proportion.
Step 3: Calculate test statistic
Sample proportion:
Test statistic:
Step 4: Find p-value
Since (left-tail), the p-value is:
Using technology or tables: p-value
Step 5: Interpret p-value
"The p-value of is the probability of obtaining a sample proportion of or less in a random sample of patients, assuming that the true proportion of all patients who experience symptom relief is ."
Step 6: Make formal decision
Since p-value = , we reject .
Step 7: Write conclusion in context
"Since the p-value of is less than the significance level of , we reject $H_0$. We have convincing statistical evidence that the true proportion of all patients who experience symptom relief with this medication is less than . The data support the advocacy group's suspicion that the company's claim of relief is overstated."
Step 8: Answer the investigative question
The statistical evidence suggests that the pharmaceutical company's claim of symptom relief is not supported by the data. The true relief rate appears to be lower than claimed, which has important implications for patients and prescribing physicians.
Another Complete Example: Two-Sided Test
Scenario: A political analyst claims that exactly of voters in a district support increasing education funding. A researcher believes the actual support level may be different. In a random sample of voters from the district's registered voters, support the increase. Test at .
Solution
Parameter: Let = the true proportion of all registered voters in this district who support increasing education funding
Hypotheses:
- (two-sided: testing for any difference)
Conditions:
- Random sample: stated ✓
- 10%: ✓
- Large counts: and ✓
Test statistic:
p-value (two-sided):
Decision: Since , we fail to reject .
Conclusion: "Since the p-value of is greater than the significance level of , we fail to reject . We do not have convincing statistical evidence that the true proportion of registered voters who support increasing education funding differs from ."
Note: This is a borderline result! The p-value is very close to . The evidence against is moderate but not quite strong enough to reject at the level.
Understanding Statistical Significance
A result is statistically significant if we reject the null hypothesis (i.e., p-value ).
What Statistical Significance Means
Statistical significance indicates:
- The observed effect is unlikely to be due to chance alone
- We have evidence of a real difference or effect
- The result is inconsistent with the null hypothesis
What Statistical Significance Does NOT Mean
Statistical significance does NOT indicate:
- The effect is large or important
- The result is practically meaningful
- The result is certainly true
- The null hypothesis is definitely false
Statistical vs. Practical Significance
An effect can be:
- Statistically significant but not practically important: With huge samples, tiny differences can be statistically significant but meaningless in practice
- Practically important but not statistically significant: With small samples, large effects might not reach statistical significance
Example: A weight loss program reduces average weight by pounds with p-value =
- Statistically significant (very small p-value)
- Not practically meaningful (half a pound is negligible)
The Logic of Hypothesis Testing: A Summary
The hypothesis testing procedure follows this logical flow:
- Assume the null hypothesis is true
- Calculate how likely our observed data would be under this assumption
- Evaluate whether the data are consistent with the assumption
- Decide whether to reject the assumption based on the evidence
- Conclude in the context of the research question
This framework provides a standardized, objective method for making decisions based on data while acknowledging uncertainty.
Common Pitfalls and How to Avoid Them
Pitfall 1: Using p̂ instead of P₀ in the test statistic
Incorrect:
Correct:
The denominator must use because we're assuming is true.
Pitfall 2: Choosing the alternative hypothesis after seeing the data
The alternative hypothesis must be determined by the research question before data analysis, not based on what direction the sample data happened to go.
Pitfall 3: Saying "accept $H₀" instead of "fail to reject $H₀"
We never accept or prove the null hypothesis. At best, we fail to find evidence against it.
Pitfall 4: Confusing p-value with significance level
- is set before the study (threshold)
- p-value is calculated from data (evidence)
- They serve different roles
Pitfall 5: Omitting context in conclusions
Always state what the test is about, referencing the specific population and variable, not just abstract statistical decisions.
