Introduction

We've learned to set up hypothesis tests (Topic 6.4) and understand p-values (Topic 6.5). Now it's time to carry out the complete hypothesis testing procedure and make formal decisions about population parameters.

Topic 6.6 is where hypothesis testing comes alive. This is where we calculate test statistics, find p-values, make formal decisions, and draw conclusions that answer real-world research questions. Understanding this topic means you can evaluate claims about populations, assess scientific studies, and make data-driven decisions.

This is one of the most important topics in the entire course. The hypothesis testing framework you learn here will be used throughout statistics and forms the backbone of scientific research, quality control, medical studies, and policy decisions.

The Test Statistic for a Population Proportion

The test statistic is a standardized value that measures how far our sample result is from what we'd expect if the null hypothesis were true.

The Formula

For testing a population proportion, the test statistic is:

$z = \frac{p ^ - p _{0}}{\frac{p _{0} ( 1 - p _{0} )}{n}}$

Where:

$\overset{p}{^}$ = sample proportion
$p_{0}$ = null hypothesized value (the value in $H_{0}$ )
$n$ = sample size

Understanding the Formula

This formula has a familiar structure that appears throughout statistics:

$test statistic = \frac{observed - expected}{standard error of statistic}$

Numerator ( $\overset{p}{^} - p_{0})$ : How far is our sample proportion from the null value?

Large differences (positive or negative) suggest the null might be wrong
Small differences are consistent with the null being true

Denominator $(\frac{p _{0} ( 1 - p _{0} )}{n})$ : The standard error based on the null hypothesis.

This measures the typical variation we'd expect in $\overset{p}{^}$ if $H_{0}$ were true
Notice we use $p_{0}$ (not $\overset{p}{^}$ ) because we are assuming the null is true

Interpreting the Test Statistic

The z-statistic tells us how many standard errors our sample proportion is from the null hypothesized value.

$z = 0$ : Sample proportion equals null value exactly
$z = 1.5$ : Sample proportion is $1.5$ standard errors above the null value
$z = - 2.3$ : Sample proportion is $2.3$ standard errors below the null value

Large absolute values of z (far from $0$ ) suggest the null hypothesis is questionable.

Example Calculation

A company claims $30%$ of customers use their mobile app ( $p_{0} = 0.30$ ). In a random sample of $200$ customers, $48$ use the app.

Step 1: Calculate sample proportion $\overset{p}{^} = \frac{48}{200} = 0.24$

Step 2: Calculate test statistic $z = \frac{0.24 - 0.30}{\frac{0.30 ( 0.70}{200}}$

$z = \frac{- 0.06}{\frac{0.21}{200}}$

$z = \frac{- 0.06}{0.00105}$

$z = \frac{- 0.06}{0.0324}$

$z \approx - 1.85$

Interpretation: The sample proportion is $1.85$ standard errors below the claimed proportion of $0.30$ .

The Null Distribution and Standard Normal

When the null hypothesis is true and all conditions are met, the test statistic follows a standard normal distribution(mean = $0$ , standard deviation = $1$ ).

Why the Standard Normal Distribution?

Remember the sampling distribution of $\hat{p}$ is approximately normal when conditions are met:

Mean: $μ_{\overset{p}{^}} = p$
Standard deviation: $σ_{\overset{p}{^}} = \frac{p ( 1 - p )}{n}$

When we standardize by subtracting the mean and dividing by the standard deviation, we get the standard normal distribution (z-distribution).

Using the Null Distribution

The null distribution (standard normal) allows us to:

Calculate probabilities (p-values)
Determine how unusual our test statistic is
Make decisions about the null hypothesis

This is why we use standard normal tables or technology—they give us areas under the standard normal curve, which are our p-values.

Finding p-Values from the Standard Normal Distribution

Once we have the test statistic, we find the p-value using the standard normal distribution. The method depends on the alternative hypothesis.

Review: The Three Cases

For $H_{a} : p > p_{0}$ (one-sided, right tail): $p-value = P (z \geq test statistic)$

For $H_{a} : p < p_{0}$ (one-sided, left tail): $p-value = P (z \leq test statistic)$

For $H_{a} : p \neq = p_{0}$ (two-sided): $p-value = 2 \cdot P (z \geq ∣ test statistic ∣)$

Using Technology

Most calculators and statistical software provide p-values directly:

Input the data or summary statistics
Specify the null hypothesized value
Specify the alternative hypothesis direction
The software calculates both z and the p-value

Common calculator functions:

TI-84: 1-PropZTest
Output includes test statistic (z) and p-value

Using Tables

Standard normal tables give $P (z < value)$ (area to the left).

Example 1 (right-tail): $z = 1.85$ , $H_{a} : p > p_{0}$

Table gives $P (z < 1.85) = 0.9678$
p-value = $1 - 0.9678 = 0.0322$

Example 2 (left-tail): $z = - 1.85$ , $H_{a} : p < p_{0}$

Table gives $P (z < - 1.85) = 0.0322$
p-value = $0.0322$

Example 3 (two-tail): $z = 1.85$ , $H_{a} : p \neq = p_{0}$

Area in right tail = $1 - 0.9678 = 0.0322$
p-value = $2 (0.0322) = 0.0644$

The Significance Level

The significance level, denoted by $α$ , is the predetermined probability of rejecting the null hypothesis given that it is true.

Understanding Significance Level

The significance level is the threshold we set before conducting the test for what counts as "convincing evidence."

Common values:

$α = 0.05$ ( $5%$ ): Most common in practice
$α = 0.01$ ( $1%$ ): More stringent, used when false positives are costly
$α = 0.10$ ( $10%$ ): More lenient, used in exploratory research

Setting the Significance Level

The significance level should be determined before collecting or analyzing data. It represents how much risk of a false positive (Type I error) we're willing to accept.

Factors influencing choice of $α$ :

Consequences of incorrectly rejecting $H_{0}$ (more serious = smaller $α$ )
Field conventions (medical research often uses $0.01$ ; social sciences often use $0.05$ )
Balance between Type I and Type II errors

The Significance Level vs. p-Value

These are often confused but are fundamentally different:

Significance level ( $α$ ):

Predetermined threshold
Set before data collection
Reflects how much evidence we require

p-value:

Calculated from data
Measures the actual evidence against $H_{0}$
Compared to $α$ to make a decision

Making the Formal Decision

The relationship between the p-value and the significance level determines whether a result is statistically significant.

The Decision Rule

If p-value $\leq α$ : Reject $H_{0}$

If p-value $> α$ : Fail to reject $H_{0}$

What "Reject $H_{0}$ " Means

When we reject the null hypothesis:

There is convincing statistical evidence to support the alternative hypothesis
The observed data would be very unusual if $H_{0}$ were true
We conclude the population parameter is likely different from the null value

Example: $H_{0} : p = 0.40$ , $H_{a} : p > 0.40$ , p-value = $0.023$ , $α = 0.05$

Since $0.023 \leq 0.05$ , we reject $H_{0}$
We have convincing evidence that $p > 0.40$

What "Fail to Reject $H_{0}$ " Means

When we fail to reject the null hypothesis:

There is not convincing statistical evidence to support the alternative hypothesis
The observed data are reasonably consistent with $H_{0}$
We do not have sufficient evidence to conclude the parameter differs from the null value

Example: $H_{0} : p = 0.60$ , $H_{a} : p \neq = 0.60$ , p-value = $0.18$ , $α = 0.05$

Since $0.18 > 0.05$ , we fail to reject $H_{0}$
We do not have convincing evidence that $p$ differs from $0.60$

Critical Distinction: We Never "Accept" or "Prove"

IMPORTANT: A hypothesis test can lead to:

Rejecting the null hypothesis
Failing to reject the null hypothesis

A hypothesis test CANNOT lead to:

"Accepting" the null hypothesis
"Proving" the null hypothesis is true
"Concluding" the null hypothesis is correct

Why We Don't Accept $H_{0}$

Failing to reject $H_{0}$ means our data are consistent with $H_{0}$ , but that doesn't mean $H_{0}$ is true. Many other values might also be consistent with our data—we simply lack evidence to distinguish between them.

Analogy: In court, "not guilty" doesn't mean "innocent"—it means insufficient evidence for conviction. Similarly, failing to reject $H_{0}$ means insufficient evidence against it, not proof it's true.

The principle: Lack of evidence against $H_{0}$ is not the same as evidence for $H_{0}$ .

Writing Conclusions in Context

A proper conclusion must be stated in context with several key components.

The Template for Conclusions

If we reject $H_{0}$ : "Since the p-value [value] is less than $α$ [value], we reject $H_{0}$ . We have convincing statistical evidence that [state $H_{a}$ in context of the problem]."

If we fail to reject $H_{0}$ : "Since the p-value [value] is greater than $α$ [value], we fail to reject $H_{0}$ . We do not have convincing statistical evidence that [state $H_{a}$ in context of the problem]."

Required Components

Explicit comparison: State whether p-value $\leq$ or $>$ $α$
Formal decision: "Reject $H_{0}$ " or "Fail to reject $H_{0}$ "
Evidence statement: "We have/do not have convincing statistical evidence"
Alternative hypothesis in context: State what you're testing for, using the actual context
Parameter and population: Reference the population proportion and population

Using Non-Definitive Language

Conclusions should use tentative, non-definitive language:

Good words to use:

"suggests," "indicates," "provides evidence"
"convincing statistical evidence"
"appears to be," "seems to be"

Avoid definitive language:

"proves," "establishes with certainty"
"definitely is," "absolutely"
"must be," "has to be"

Examples of Good Conclusions

Example 1: Testing if proportion of defective items exceeds 0.05

$H_{0} : p = 0.05$ , $H_{a} : p > 0.05$
p-value = $0.018$ , $α = 0.05$

Good conclusion: "Since the p-value of $0.018$ is less than the significance level of $0.05$ , we reject $H_{0}$ . We have convincing statistical evidence that the true proportion of defective items produced by this machine exceeds $0.05$ ."

Example 2: Testing if customer satisfaction has changed from 0.72

$H_{0} : p = 0.72$ , $H_{a} : p \neq = 0.72$
p-value = $0.31$ , $α = 0.05$

Good conclusion: "Since the p-value of $0.31$ is greater than the significance level of $0.05$ , we fail to reject $H_{0}$ . We do not have convincing statistical evidence that the true proportion of satisfied customers differs from $0.72$ ."

Common Mistakes in Conclusions

Incorrect: "We accept the null hypothesis." Correct: "We fail to reject the null hypothesis."

Incorrect: "We proved that the proportion is greater than 0.40." Correct: "We have convincing evidence that the proportion is greater than 0.40."

Incorrect: "The null hypothesis is true." Correct: "We do not have convincing evidence against the null hypothesis."

Incorrect: "The result is significant." (without context) Correct: "We have convincing statistical evidence that the true proportion of voters who support the measure exceeds 0.50."

Connecting to the Investigative Question

The hypothesis test serves as statistical reasoning to support the answer to a research question.

From Statistics to Science

The hypothesis test provides the statistical justification, which then informs the practical or scientific conclusion.

Example investigative question: "Has the company's new customer service training improved satisfaction rates?"

Statistical work:

Collect data on current satisfaction
Test $H_{0} : p = 0.68$ (old rate) vs. $H_{a} : p > 0.68$
Get p-value = $0.004$
Reject $H_{0}$ at $α = 0.05$

Answer to investigative question: "Yes, there is convincing statistical evidence that the customer satisfaction rate has increased from the previous rate of $0.68$ . The data suggest the training program was effective."

The Bridge from Data to Decision

The hypothesis test tells us:

Whether the data provide evidence (decision to reject or not)
How strong the evidence is (p-value)
Direction of the effect (from $H_{a}$ )

This statistical evidence then informs practical decisions:

Should we implement the new policy?
Should we continue using this supplier?
Should we recommend the treatment?

Complete Example: Full Hypothesis Test

Scenario: A pharmaceutical company claims that $85%$ of patients experience symptom relief with their medication. A consumer advocacy group suspects the true rate is lower. They conduct a study with a random sample of $300$ patients from thousands who have used the medication. Of these $300$ patients, $240$ experienced symptom relief. Test the advocacy group's suspicion at the $α = 0.05$ significance level.

Step 1: State the parameter and hypotheses

Parameter: Let $p$ = the true proportion of all patients who experience symptom relief with this medication

Hypotheses:

$H_{0} : p = 0.85$
$H_{a} : p < 0.85$ (one-sided because we suspect the rate is lower)

Step 2: Check conditions

Randomization: The problem states a random sample was selected from all patients who have used the medication. Condition is satisfied.

10% condition: The sample of $300$ is less than $10%$ of thousands of patients. $300 < 0.10 (thousands)$ is satisfied.

Large counts condition: $n p_{0} = 300 (0.85) = 255 \geq 10$ ✓ $n (1 - p_{0}) = 300 (0.15) = 45 \geq 10$ ✓

All conditions are satisfied for a one-sample z-test for a population proportion.

Step 3: Calculate test statistic

Sample proportion: $\overset{p}{^} = \frac{240}{300} = 0.80$

Test statistic: $z = \frac{p ^ - p _{0}}{\frac{p _{0} ( 1 - p _{0} )}{n}}$

$z = \frac{0.80 - 0.85}{\frac{0.85 ( 0.15 )}{300}}$

$z = \frac{- 0.05}{\frac{0.1275}{300}}$

$z = \frac{- 0.05}{0.000425}$

$z = \frac{- 0.05}{0.0206}$

$z \approx - 2.43$

Step 4: Find p-value

Since $H_{a} : p < 0.85$ (left-tail), the p-value is: $p-value = P (z \leq - 2.43)$

Using technology or tables: p-value $\approx 0.0075$

Step 5: Interpret p-value

"The p-value of $0.0075$ is the probability of obtaining a sample proportion of $0.80$ or less in a random sample of $300$ patients, assuming that the true proportion of all patients who experience symptom relief is $0.85$ ."

Step 6: Make formal decision

Since p-value = $0.0075 < 0.05 = α$ , we reject $H_{0}$ .

Step 7: Write conclusion in context

"Since the p-value of $0.0075$ is less than the significance level of $0.05$ , we reject $H_0$. We have convincing statistical evidence that the true proportion of all patients who experience symptom relief with this medication is less than $0.85$ . The data support the advocacy group's suspicion that the company's claim of $85%$ relief is overstated."

Step 8: Answer the investigative question

The statistical evidence suggests that the pharmaceutical company's claim of $85%$ symptom relief is not supported by the data. The true relief rate appears to be lower than claimed, which has important implications for patients and prescribing physicians.

Another Complete Example: Two-Sided Test

Scenario: A political analyst claims that exactly $60%$ of voters in a district support increasing education funding. A researcher believes the actual support level may be different. In a random sample of $250$ voters from the district's $18, 000$ registered voters, $165$ support the increase. Test at $α = 0.05$ .

Solution

Parameter: Let $p$ = the true proportion of all registered voters in this district who support increasing education funding

Hypotheses:

$H_{0} : p = 0.60$
$H_{a} : p \neq = 0.60$ (two-sided: testing for any difference)

Conditions:

Random sample: stated ✓
10%: $250 < 0.10 (18000) = 1800$ ✓
Large counts: $250 (0.60) = 150 \geq 10$ and $250 (0.40) = 100 \geq 10$ ✓

Test statistic: $\overset{p}{^} = \frac{165}{250} = 0.66$

$z = \frac{0.66 - 0.60}{\frac{0.60 ( 0.40 )}{250}} = \frac{0.06}{0.00096} = \frac{0.06}{0.031} \approx 1.94$

p-value (two-sided): $p-value = 2 \cdot P (z \geq 1.94) = 2 (0.0262) = 0.0524$

Decision: Since $0.0524 > 0.05$ , we fail to reject $H_{0}$ .

Conclusion: "Since the p-value of $0.0524$ is greater than the significance level of $0.05$ , we fail to reject $H_{0}$ . We do not have convincing statistical evidence that the true proportion of registered voters who support increasing education funding differs from $0.60$ ."

Note: This is a borderline result! The p-value is very close to $α$ . The evidence against $H_{0}$ is moderate but not quite strong enough to reject at the $0.05$ level.

Understanding Statistical Significance

A result is statistically significant if we reject the null hypothesis (i.e., p-value $\leq α$ ).

What Statistical Significance Means

Statistical significance indicates:

The observed effect is unlikely to be due to chance alone
We have evidence of a real difference or effect
The result is inconsistent with the null hypothesis

What Statistical Significance Does NOT Mean

Statistical significance does NOT indicate:

The effect is large or important
The result is practically meaningful
The result is certainly true
The null hypothesis is definitely false

Statistical vs. Practical Significance

An effect can be:

Statistically significant but not practically important: With huge samples, tiny differences can be statistically significant but meaningless in practice
Practically important but not statistically significant: With small samples, large effects might not reach statistical significance

Example: A weight loss program reduces average weight by $0.5$ pounds with p-value = $0.003$

Statistically significant (very small p-value)
Not practically meaningful (half a pound is negligible)

The Logic of Hypothesis Testing: A Summary

The hypothesis testing procedure follows this logical flow:

Assume the null hypothesis is true
Calculate how likely our observed data would be under this assumption
Evaluate whether the data are consistent with the assumption
Decide whether to reject the assumption based on the evidence
Conclude in the context of the research question

This framework provides a standardized, objective method for making decisions based on data while acknowledging uncertainty.

Common Pitfalls and How to Avoid Them

Pitfall 1: Using p̂ instead of P₀ in the test statistic

Incorrect: $z = \frac{p ^ - p _{0}}{\frac{p ^ ( 1 - p ^ )}{n}}$

Correct: $z = \frac{p ^ - p _{0}}{\frac{p _{0} ( 1 - p _{0} )}{n}}$

The denominator must use $p_{0}$ because we're assuming $H_{0}$ is true.

Pitfall 2: Choosing the alternative hypothesis after seeing the data

The alternative hypothesis must be determined by the research question before data analysis, not based on what direction the sample data happened to go.

Pitfall 3: Saying "accept $H₀" instead of "fail to reject $H₀"

We never accept or prove the null hypothesis. At best, we fail to find evidence against it.

Pitfall 4: Confusing p-value with significance level

$α$ is set before the study (threshold)
p-value is calculated from data (evidence)
They serve different roles

Pitfall 5: Omitting context in conclusions

Always state what the test is about, referencing the specific population and variable, not just abstract statistical decisions.

FiveHive