Constructing a Confidence Interval for a Population Proportion

Introduction

So far, we've learned that sample proportions vary from sample to sample, and we've explored the sampling distribution that describes this variation. But in real statistical practice, we don't know the true population proportion—that's what we're trying to estimate!

A confidence interval moves us from point estimates (single numbers) to interval estimates (ranges of plausible values). Instead of saying "I estimate the proportion is 0.62," we say "I am 95% confident the proportion is between 0.58 and 0.66." This approach acknowledges the uncertainty inherent in sampling and provides a more honest, complete answer to questions about populations.

This topic, 3.3, is a very important chapter on inferential statistics. Confidence intervals are very important in the AP curriculum and can be used in virtually any problem in life.

What is a Confidence Interval?

A confidence interval is an interval estimate for a population parameter. Rather than providing a single value, it gives a range of plausible values based on sample data.

The general structure is:

point estimate ± margin of error

For a population proportion, this becomes:

sample proportion ± margin of error

Why We Need Confidence Intervals

Consider two researchers estimating voter support for a proposition:

Researcher A (sample of 50): $\overset{p}{^} = 0.48$

Researcher B (sample of 1000): $\overset{p}{^} = 0.48$

Both have the same point estimate, but clearly the larger sample provides more reliable information. A confidence interval captures this difference:

Researcher A might report: $(0.34, 0.62)$
Researcher B might report: $(0.45, 0.51)$

The interval width reflects our uncertainty, which depends on sample size and variability.

The One-Sample z-Interval for a Population Proportion

When we want to construct a confidence interval for a single population proportion $p$ , we use the one-sample z-interval for a population proportion. Don’t worry about the “One-Sample” part, we’ll get to that later. Just remember that we are using a z-interval for a population proportion.

The Formula

$\overset{p}{^} \pm z^{*} \frac{p ^ ( 1 - p ^ )}{n}$

Where:

$\overset{p}{^}$ = sample proportion (point estimate)
$z^{*}$ = critical value from the standard normal distribution
$n$ = sample size
$\frac{p ^ ( 1 - p ^ )}{n}$ = standard error

Important note: We use $\overset{p}{^}$ in the standard error calculation (not $p$ ) because we don't know the true population proportion. That is actually what we are looking for. As a student, it is critical to understand each part of this formula very well.

Components Explained

Point estimate ( $\overset{p}{^}$ ): The center of our interval, calculated from our sample data
Critical value ( $z^{*}$ ): A multiplier from the standard normal distribution that determines how many standard errors to go out from the center
Standard error: An estimate of the standard deviation of the sampling distribution, calculated using sample data

The Parameter

When stating what we're estimating, we must properly define the parameter in context. The parameter should reference:

The proportion (not mean, not total)
The response variable (what characteristic we're measuring)
The population (who or what we're studying)

Example of a well-stated parameter: "Let $p$ = the true proportion of all registered voters in Ohio who support Issue 1"

Poor parameter statements:

"The proportion" (missing context)
"The proportion of people who agree" (what population?)
"The mean support" (wrong parameter type)

Critical Values and Confidence Levels

The critical value $z^{*}$ determines how wide our interval is and depends on our desired confidence level.

Common Critical Values

Confidence Level	Critical Value
90%	1.645
95%	1.96
98%	2.326
99%	2.576

Note that you do not need to memorize this. If you have a TI-84 Calculator, which is allowed on CollegeBoard exams, you can manually calculate these, or you can use the appropriate table on your exam to give you these critical values. This is known as Table A. The only value worth remembering is 1.96 for the 95% confidence level.

Understanding Critical Values

The critical value $z^{*}$ is chosen so that $- z^{*}$ and $+ z^{*}$ capture the middle $C$ of the standard normal distribution.

For a 95% confidence level:

The middle 95% of the standard normal distribution lies between $z = - 1.96$ and $z = 1.96$
This leaves 2.5% in each tail, which, combined, is 5% leftover.
So $z^{*} = 1.96$

Key relationship: We should note that if we want a higher confidence, we need to have a larger critical value, or z-score. This means we have a wider interval.

Conditions for the One-Sample z-Interval

Remember from chapter 3.2, where we had to check for 3 conditions? Here is where we use it in inferential statistics: Confidence intervals.

Condition 1: Randomization

The data should be collected using a random sample.

This ensures the sample is representative and unbiased. State clearly how randomization was achieved:

"The company selected a simple random sample of 200 customers"
"Students were randomly selected using a random number generator"

These are both phrases that you should be on the lookout for when checking this condition.

Condition 2: The 10% Condition

When sampling without replacement, the population must be at least 10 times larger than the sample.

Mathematically: $n < 0.10 N$ or equivalently $N \geq 10 n$ . Sometimes, thinking about it in that way is more logical than others.

Example:

Sample size: $n = 150$
Population size: $N = 2400$
Check: $150 < 0.10 (2400) = 240$

Condition 3: Large Counts (Normality Condition)

Both the expected number of successes and failures must be at least 10.

Check: $n \overset{p}{^} \geq 10$ and $n (1 - \overset{p}{^}) \geq 10$

Check example:

Sample: $n = 120$ , $\overset{p}{^} = 0.35$
Successes: $120 (0.35) = 42 \geq 10$ ✓
Failures: $120 (0.65) = 78 \geq 10$ ✓

Why this matters: This ensures the sampling distribution is approximately normal, validating our use of the z-interval.

Standard Error

The standard error (SE) is an estimate of the standard deviation of the sampling distribution of $\overset{p}{^}$ .

$S E = \frac{p ^ ( 1 - p ^ )}{n}$

Standard Error vs. Standard Deviation of Sampling Distribution

In Topic 3.2, we learned: $σ_{\overset{p}{^}} = \frac{p ( 1 - p )}{n}$

Now we use: $S E = \frac{p ^ ( 1 - p ^ )}{n}$

The difference: We use $\overset{p}{^}$ instead of $p$ because we don't know the true population proportion. The standard error is our estimate based on sample data.

Interpretation

The standard error quantifies the typical amount that a sample proportion varies from the true population proportion.

Example: If $S E = 0.028$ , we would say: "The sample proportion typically varies by about 0.028 (or 2.8 percentage points) from the true population proportion."

Margin of Error

The margin of error (MOE) is half the width of the confidence interval.

$M O E = z^{*} \cdot S E = z^{*} \frac{p ^ ( 1 - p ^ )}{n}$

The confidence interval is then: $(\overset{p}{^} - M O E, \overset{p}{^} + M O E)$

Example Calculation

A random sample of 400 adults found that 156 use streaming services exclusively.

Step 1: Calculate $\overset{p}{^}$ $\overset{p}{^} = \frac{156}{400} = 0.39$

Step 2: Calculate standard error $S E = \frac{0.39 ( 0.61 )}{400} = \frac{0.2379}{400} = 0.00059475 \approx 0.0244$

Step 3: Find margin of error (95% confidence, so $z^{*} = 1.96$ ) $M O E = 1.96 (0.0244) \approx 0.0478$

Step 4: Construct interval $0.39 \pm 0.0478$ $(0.3422, 0.4378)$

Interpretation: "We are 95% confident that the true proportion of all adults who use streaming services exclusively is between 0.342 and 0.438."

Determining Sample Size (A common type of problem)

Sometimes we need to determine the sample size required to achieve a desired margin of error. We rearrange the margin of error formula to solve for $n$ :

$n = (\frac{z ^{*}}{M O E})^{2} \overset{p}{^} (1 - \overset{p}{^})$

The Problem

Often, we don't have a preliminary estimate for $\overset{p}{^}$ . What do we do?

The Solution

Use $\overset{p}{^} = 0.5$

This gives us the maximum possible sample size needed because $\overset{p}{^} (1 - \overset{p}{^})$ is maximized when $\overset{p}{^} = 0.5$ :

When $\overset{p}{^} = 0.5$ : $0.5 (0.5) = 0.25$
When $\overset{p}{^} = 0.3$ : $0.3 (0.7) = 0.21$
When $\overset{p}{^} = 0.8$ : $0.8 (0.2) = 0.16$

Using $\overset{p}{^} = 0.5$ ensures we find the upper bound for the sample size that will guarantee our desired margin of error.

Sample Size Formula (Conservative Approach)

$n = (\frac{z ^{*}}{M O E})^{2} (0.25)$

Always round up to the next whole number because you can't sample a fractional person.

Many times, you will encounter a problem of this type. You should expect to know how to solve it, as it is sometimes error-prone for students.

A manufacturer of a candy company wants to estimate the true proportion of green candies produced. They believe it is around 0.3, or 30%. However, the top dog wants to create a 95% confidence interval, with a margin of error of no more than 0.04. What is the minimum size sample $n$ such that the manufacturer can meet the qualifications, and keep his job?

First off, we know what the formula for the Margin of error is: $M O E = z^{*} \cdot S E = z^{*} \frac{p ^ ( 1 - p ^ )}{n}$

We also know the margin of error is 0.04, the confidence level is 95%, thus the critical value is 1.96, and lastly, we know $\overset{p}{^} = 0.30$

We can now just use simple algebra to figure out this problem:

$0.04 = 1.96 \frac{( 0.3 ) ( 0.7 )}{n} = 1.96 \cdot \frac{0.21}{n}$

Dividing both sides by 1.96: $0.0204 = \frac{0.21}{n}$

Squaring both sides: $0.000416 = \frac{0.21}{n}$

Finally, solving for n: $n = 504.8$

But since we always round up, we’ll just round it to 505.

So, the final answer is 505.

Complete Example: Constructing a Confidence Interval

Problem: A researcher randomly selects 250 residents from a city of 45,000 and finds that 78 have solar panels installed. Construct a 90% confidence interval for the proportion of all residents in the city who have solar panels.

Step 1: Define the parameter. Let $p$ = the true proportion of all residents in this city who have solar panels installed

Step 2: Check our conditions

Randomization: The problem states residents were randomly selected ✓

10% condition: $n = 250$ and $N = 45000$ $250 < 0.10 (45000) = 4500$ ✓

Large counts: $\overset{p}{^} = \frac{78}{250} = 0.312$

Successes: $250 (0.312) = 78 \geq 10$ ✓
Failures: $250 (0.688) = 172 \geq 10$ ✓

All conditions are met. We may continue with our problem.

Step 3: Calculate the interval

Critical value for 90% confidence: $z^{*} = 1.645$