Traveling around the world to find the mean number of children in a household can be costly and time-consuming. Furthermore, it is not practical during a pandemic.

However, we can use samples to estimate the population mean and population variance. It is necessary to calculate the mean and variance of the sample to estimate the population mean and population variance.

You will see some examples of biased and unbiased estimators and learn about the difference.

Definition of Biased and Unbiased Point Estimates

We use samples to estimate the population parameters. Unfortunately, samples are most of the time wrong. This is not such as bad thing. What we can't tolerate is the samples consistently overestimating or underestimating the population parameters. If there is an equal number of over-estimates and under-estimates you would call this an unbiased point estimator. We define biased estimates as formulae that on average equal the true value of the parameter. On the other hand, if there are more overestimates than underestimates or the other way round then we would say that the point estimator is biased. Here the data will be negatively skewed (most of the data above the mean) or positively skewed ( most of the data below the mean).

Examples of biased point estimate

A simple example would be a dart board. If you are consistently missing the bull's eyes and the hits are evenly distributed around the center then this would be unbiased. The average of your hits would be the bull's eye. On the other hand, if you are throwing the darts frequently to the left-hand side of the bull's eye then this would be described as biased. You may be consistently hitting the left-hand side but the average of your hits is well away from the bull's eye.

The average of the values do not hit the bull's eye. Biased estimate The average of the values do hit the bull's eye Unbiased estimate

An example of a biased estimator would include calculating the sample variance using the following formula

\({\Sigma_{i=1}^{n}(x_{i}-\bar{x})^{2}\over n }\)

Experiments show that using this formula consistently underestimates the population variance. Also, you can prove it algebraically!

Another example is the sample standard deviation. This can be proven that this estimate also underestimates the population standard deviation.

Finally, we should add the range and the median of the sample as further examples of missing the true value of the population parameter.

Unbiased Estimator Examples

In contrasts, calculating the sample variance using the following formula is considered as an unbiased estimator of population variance.

\(s^{2} = {\Sigma_{i=1}^{n}(x_{i}-\bar{x})^{2}\over n - 1 }\)

Samples do not necessarily take into consideration the extreme values that would be included in the population data. In order to increase the variance in the data, you divide by n-1 rather than n.

If we calculate the variance of the population we can divide by n as all the data has the freedom to be any value. For a sample, we do not know the population mean so our sample mean is likely to be incorrect. we all wish the sum of the deviations = 0.

Population deviation = \(x_{i} - \mu\)

Sample Deviation = \(x_{i} - \bar{x}\)

For this to be true, the last observation can not be any value so we have n - 1 degrees of freedom. This provides some explanation for why we divide by n-1 rather n for the sample.

Another example would be the sample mean.

\(\bar{x} = {\Sigma_{i=1}^{n}x_{i}\over n}\)

This can be proven to be unbiased- there is an equal number of overestimates as well as underestimates of the population mean.

Given the values below, find the best point estimate for the population mean \(μ\).
\[7.61, 7.17, 9.06, 6.305, 7.805, 7.11, 9.705, 6.11, 8.56, 7.11, 6.455, 9.06\]
Solution:

\[\begin{align} \bar{X}&={Σ_{i} x_{i}\over n } \\ \bar{X}&={7.61+7.17+9.06+6.305+7.805+7.11+9.705+6.11+8.56+7.11+6.455+9.06\over 12 } \\ \bar{X}&={92.06\over 12 } \\ \bar{X}&=7.67 \end{align}\]

The best point estimate for the population mean \(μ\) is \(\bar{X}=7.67\).

The population proportion can also be calculated using an unbiased estimator. The population proportion can be estimated by dividing the number of successes (x) by the sample size (n). This can be expressed as:

\( \hat{p} = {x\over n}\).

X is the number of successes and n is the sample size. X is considered to be binomial random variable with an expected value np.

\(\mathbb{E}[\hat{p}] =\mathbb{E}[{x\over n }]\)

You will need to take the constant out.

\(\mathbb{E}[\hat{p}] ={1\over n} \mathbb{E}[x] = {np\over n} = p\)

Hence, you can see that \(\hat{p}\) is an unbiased estimation of p.

A sample of teacher trainees

A survey was conducted using a sample of 300 teacher trainees in a training school to determine what proportion of them view the services provided to them favorably. Out of 150 trainees, 103 of them responded that they viewed the services provided to them by the authorities as favorable. Find the point estimation for this data.

Solution

x = 103 and n = 150.

\( \hat{p} = {x\over n}\).

\( \hat{p} = {103\over 150}\) = 0.686 or 68.7%.

The researchers of this survey can establish the point estimate which is the population proportion as 0.686 or 68.7%.

Biased and Unbiased Estimator Formula

Let \(θ\) be the parameter that we want to estimate and \(\hat{\theta}\) is our best estimator.

Bias = \(\mathbb{E}(\hat{θ})- θ\)

The best unbiased estimators means that

\(\mathbb{E}(\hat{θ}) = θ\)

Here the bias is equal to 0.

It is ok that the sample estimation is wrong but we want the average to hit the true value.

Difference between Biased and Unbiased Estimates in Statistics

All estimates are likely to be wrong. Biased estimates consistently overestimate or underestimate the population. The data may have high or low variability. The best biased estimators have low variability and have an average which hits the true value.

Biased and Unbiased Point Estimates - Key takeaways

Samples make incorrect estimates of the parameters.
Biased estimators created from the samples either consistently overestimate or underestimate the parameter.
Unbiased estimators misses the target but is evenly distributed around the true value of the population
The sample mean calculated using the formula \({Σ_{i} X_{i}\over n }\) is an unbiased estimator of the population mean
The sample mean calculated using the formula \({Σ_{i} X_{i}\over n }\) is an unbiased estimator of the population mean
In order to estimate the population variance with our sample we should use the formula \({Σ_{i} (X_{i}-X̄)^2\over n-1 }\)
\({Σ_{i} (X_{i}-X̄)^2\over n}\) would be a biased estimator of the population variance
\(\hat{p}\) which is calculated by x number of successes divided by the total number is an unbiased estimation of the proportion p.