In order to find parameters such as the population mean, it is not always practical or cost-efficient to travel around the world and collect the data. You just have to be satisfied with your sample and use that to get an interval for your parameter. This is called a confidence interval. You can aim for a narrow interval to estimate your parameter but it is unlikely to be correct. The larger the interval the more likely it will be correct and contain the true value. A larger interval will have greater confidence than a smaller interval. Confidence intervals contrast with the point estimator where you use your sample to estimate the population parameter.
To calculate the confidence interval for any distribution, certain assumptions have to be met before such inferences are made about a population. If the following assumptions are not verified, the results may be irrational statistically:
Observations are independent.
The population must be normally distributed:
The confidence interval can be calculated using two different test statistics in this study: t-interval for sample sizes less than 30, and z-interval for large sample sizes.
According to the Central Limit Theorem, the distribution of the sample means is normal distribution when the sample size is large.
Confidence level is another name for the probability that the z value is within the limit.
z is the standard normal variable. It is between the values -4 and 4 and measures the number of standard deviations between the true population mean for the samples and the sample. We want to estimate the population mean but have problems if we narrow the interval, the probability that the interval is correct gets smaller. Let us go for a 95% chance that our confidence interval is correct.
\(P(-a\lt z\lt a) = 0.95\). Using the standard normal distribution tables you can see that a = 1.96.
\(P(-1.96\lt z\lt 1.96) = 0.95\)Let us recall the formula for z
\(\bar{x}\) is a random variable which is normally distribution.
After rearranging
This means that the probability that your interval contains the true mean of all the sample means is 95%.
It can be shown that : \(\mu_{\bar{x}} = \mathbb{E}[\bar{x}] = \mathbb{E}[x] = \mu_{{x}}\)
We can now find an interval for the population mean.
It is likely that we do not know the population variance therefore will need to compute the sample variance s2 instead.
There are other confidence levels that you may use such as 0.90. This confidence interval will be the following:
\(P(\bar{x}-1.645\sqrt{{\sigma^2 \over n}}\lt \mu_{x}\lt \bar{x}+1.645\sqrt{{\sigma^2 \over n}}) = 0.90 \)Again you will need to use the standard normal distribution tables to see that \( P(-1.645\lt z \lt 1.645) = 0.90\)
t-distribution
\(\nu\) is the degrees of freedom. \(\nu = n - 1\). When the sample size increases the t-distribution becomes the normal distribution. In a sample, there are only n-1 random variables especially if you wish the sample mean to be equal to the population mean.
\(\mu_{\bar{x}}-t\sqrt{{s^2 \over n}}\lt \mu_{x}\lt \bar{x}+t\sqrt{{s^2 \over n}}\)
t is obtained by:
firstly deciding on the confidence level for the test (99%, 95%, and 90% conventional levels).
next you need to find the critical value corresponding to α with n-1 degrees of freedom.
Find the confidence interval for n =100, t = 1.96 for a 95% confidence interval, s = 5 and x = 35
\(\bar{x}-t\sqrt{{s^2 \over n}}\lt \mu_{x}\lt \bar{x}+t\sqrt{{s^2 \over n}}\)
\(35-1.96\sqrt{5^2 \over 35}\lt \mu_{x}\lt 35+1.96\sqrt{5^2 \over 35}\)
\(34.02 \lt \mu_{x}\lt 35.98\)
Therefore, with 95% confidence, the population mean lies within 34.02 and 35.98 confidence interval
We can further explore confidence intervals by finding the range of values for the expected difference of two means.
We can estimate the interval for the parameter p. The normal distribution is used rather than the t-distribution since according to the Central Limit Theorem, the distribution of proportions becomes normal for the large sample sizes. We have to ensure the the sample size is sufficiently large.
We can calculate the interval using the following formula:
\(\hat{p}-z\sqrt{\mathbb{Var}[\hat{p}]}\lt \mu_{x}\lt \hat{p}+z\sqrt{\mathbb{Var}[\hat{p}]}\)
\(\hat{p}-z\sqrt{{\hat{p}(1-\hat{p}) \over n}}\lt \mu_{x}\lt \hat{p}+z\sqrt{{\hat{p}(1-\hat{p}) \over n}}\)
Click on the link to find more information about proportions.
\(\mu_{\bar{x}}-z\sqrt{{s^2 \over n}}\lt \mu_{x}\lt \bar{x}+z\sqrt{{s^2 \over n}}\)
\(\mu_{\bar{x}}-t\sqrt{{s^2 \over n}}\lt \mu_{x}\lt \bar{x}+t\sqrt{{s^2 \over n}}\)