In order to find parameters such as the population mean, it is not always practical or cost-efficient to travel around the world and collect the data. You just have to be satisfied with your sample and use that to get an interval for your parameter. This is called a confidence interval. You can aim for a narrow interval to estimate your parameter but it is unlikely to be correct. The larger the interval the more likely it will be correct and contain the true value. A larger interval will have greater confidence than a smaller interval. Confidence intervals contrast with the point estimator where you use your sample to estimate the population parameter.

Calculating Confidence Interval

To calculate the confidence interval for any distribution, certain assumptions have to be met before such inferences are made about a population. If the following assumptions are not verified, the results may be irrational statistically:

Observations are independent.
The population must be normally distributed:

The confidence interval can be calculated using two different test statistics in this study: t-interval for sample sizes less than 30, and z-interval for large sample sizes.

Confidence Interval for the Population Mean

According to the Central Limit Theorem, the distribution of the sample means is normal distribution when the sample size is large.

Normal distribution for large sample sizes

Confidence level is another name for the probability that the z value is within the limit.

z is the standard normal variable. It is between the values -4 and 4 and measures the number of standard deviations between the true population mean for the samples and the sample. We want to estimate the population mean but have problems if we narrow the interval, the probability that the interval is correct gets smaller. Let us go for a 95% chance that our confidence interval is correct.

\(P(-a\lt z\lt a) = 0.95\). Using the standard normal distribution tables you can see that a = 1.96.

\(P(-1.96\lt z\lt 1.96) = 0.95\)

Let us recall the formula for z

\(z = {\bar{x} - \mu_{\bar{x}} \over \sigma_{\bar{x}}}\)

\(\bar{x}\) is a random variable which is normally distribution.

\(P(-1.96\lt {\bar{x} - \mu_{\bar{x}} \over \sigma_{\bar{x}}} \lt 1.96) = 0.95 \)

After rearranging

\(P( \bar{x}-1.96\sigma_{\bar{x}}\lt \mu_{\bar{x}}\lt \bar{x}+1.96\sigma_{\bar{x}}) = 0.95 \)

This means that the probability that your interval contains the true mean of all the sample means is 95%.

It can be shown that : \(\mu_{\bar{x}} = \mathbb{E}[\bar{x}] = \mathbb{E}[x] = \mu_{{x}}\)

\(\sigma_{\bar{x}}^2 = \mathbb{Var}(\bar{x}) = {\mathbb{Var}(x)\over n} = {\sigma^2 \over n}\)

We can now find an interval for the population mean.

\(P(\mu_{\bar{x}}-1.96\sqrt{{\sigma^2 \over n}}\lt \mu_{x}\lt \bar{x}+1.96\sqrt{{\sigma^2 \over n}}) = 0.95 \)

It is likely that we do not know the population variance therefore will need to compute the sample variance s² instead.

\(P(\bar{x}-1.96\sqrt{{s^2 \over n}}\lt \mu_{x}\lt \bar{x}+1.96\sqrt{{s^2 \over n}}) = 0.95 \)

There are other confidence levels that you may use such as 0.90. This confidence interval will be the following:

\(P(\bar{x}-1.645\sqrt{{\sigma^2 \over n}}\lt \mu_{x}\lt \bar{x}+1.645\sqrt{{\sigma^2 \over n}}) = 0.90 \)

Again you will need to use the standard normal distribution tables to see that \( P(-1.645\lt z \lt 1.645) = 0.90\)

T- distribution for sample sizes less than 30

For such data, it is most appropriate to use the t-distribution to calculate the confidence interval. This is because for a small sample size there is a greater chance of extreme values- this implies fatter tails compared to the normal distribution.

t distribution at different degrees of freedom t-distribution

\(\nu\) is the degrees of freedom. \(\nu = n - 1\). When the sample size increases the t-distribution becomes the normal distribution. In a sample, there are only n-1 random variables especially if you wish the sample mean to be equal to the population mean.

\(\mu_{\bar{x}}-t\sqrt{{s^2 \over n}}\lt \mu_{x}\lt \bar{x}+t\sqrt{{s^2 \over n}}\)

with n-1 degrees of freedom.
Where x is the sample mean,
t is the critical value of t obtained from a t-table,
s is the sample standard deviation and n is the sample size.

t is obtained by:

firstly deciding on the confidence level for the test (99%, 95%, and 90% conventional levels).
next you need to find the critical value corresponding to α with n-1 degrees of freedom.

Find the confidence interval for n =100, t = 1.96 for a 95% confidence interval, s = 5 and x = 35

\(\bar{x}-t\sqrt{{s^2 \over n}}\lt \mu_{x}\lt \bar{x}+t\sqrt{{s^2 \over n}}\)

\(35-1.96\sqrt{5^2 \over 35}\lt \mu_{x}\lt 35+1.96\sqrt{5^2 \over 35}\)

\(34.02 \lt \mu_{x}\lt 35.98\)

Therefore, with 95% confidence, the population mean lies within 34.02 and 35.98 confidence interval

We can further explore confidence intervals by finding the range of values for the expected difference of two means.

Confidence Interval for Proportions

We can estimate the interval for the parameter p. The normal distribution is used rather than the t-distribution since according to the Central Limit Theorem, the distribution of proportions becomes normal for the large sample sizes. We have to ensure the the sample size is sufficiently large.

We can calculate the interval using the following formula:

\(\hat{p}-z\sqrt{\mathbb{Var}[\hat{p}]}\lt \mu_{x}\lt \hat{p}+z\sqrt{\mathbb{Var}[\hat{p}]}\)

\(\hat{p}-z\sqrt{{\hat{p}(1-\hat{p}) \over n}}\lt \mu_{x}\lt \hat{p}+z\sqrt{{\hat{p}(1-\hat{p}) \over n}}\)

\(\hat{p} = {x \over n}\) is the sample proportion. x is the number of successes in the sample which follows the binomial distribution.
\({\hat{p}(1-\hat{p})\over n}\) is the sample variance for the proportion
z is the critical value with at a particular level of confidence.
n is the sample size.

Click on the link to find more information about proportions.

Confidence Intervals - Key takeaways

The confidence interval provides a range for the parameter
The confidence level gives the probability that the range contains the true value of the parameter.
When the sample size is large we can use the standard normal distribution to find the range for the population mean
\(\mu_{\bar{x}}-z\sqrt{{s^2 \over n}}\lt \mu_{x}\lt \bar{x}+z\sqrt{{s^2 \over n}}\)
When sample size is small we need to use the t-distribution to find the range for the population mean
\(\mu_{\bar{x}}-t\sqrt{{s^2 \over n}}\lt \mu_{x}\lt \bar{x}+t\sqrt{{s^2 \over n}}\)
Confidence interval is affected by two factors: confidence level and the size of the sample.