Traveling around the world to find the mean number of children in a household can be very expensive and time-consuming. Also, it is not practical during a pandemic. However, we can use samples to estimate the population mean and population variance. It's necessary to calculate the mean and variance of the sample so that it is the best estimate of the population mean and population variance. You will learn how to compute the best estimate of these parameters and show why these estimates are unbiased.

Best Unbiased Estimators: this method uses an estimator whose expected value is equal to the parameter.

Best unbiased estimator

Let θ be the parameter that we want to estimate and ̂θ is our best estimator.

Bias = \(\mathbb{E}(\hat{θ})- θ\)

The best unbiased estimators means that

\(\mathbb{E}(\hat{θ}) = θ\)

Sample mean as an estimator of the population mean

\( X̄={Σ_{i} X_{i}\over n }\)

Let us remind ourselves of some key properties of expected values:

Property 1

If we have a bag of 10 die. Each die has the same distribution. They are all fair.

\(\mathbb{E}[X_{1}]=\mathbb{E}[X_{2}] = ... = \mathbb{E}[X_{n}] = \mathbb{E}[X] =μ \)

The expected value of each die will be the same and also equal to any fair die. This expected value is the population mean.

Property 2

\(\mathbb{E}[X_{1}+X_{2} + ... + X_{n}] = \mathbb{E}[X_{1}]+\mathbb{E}[X_{2}] + ... + \mathbb{E}[X_{n}]\)

This is true for independent events. This can be expressed as:

\(\mathbb{E}[Σ_{i} X_{i}] = Σ_{i} \mathbb{E}[X_{i}]\)

A tangible example would be two independent dice. The roll of one die does not affect the roll of another die. The expected value of each die is 3.5. The expected value of the sum of the dice is 7.5.

Property 3

\( \mathbb{E}[aY] = a\mathbb{E}[Y]\)

where a is a constant and Y is a random variable.

Now we are ready for the proof

\(\mathbb{E}[\bar{X}] = \mathbb{E}[{Σ_{i} X_{i}\over n }]\)

We can take out the constant - property 3

\(\mathbb{E}[X̄] ={1\over n } \mathbb{E}[Σ_{i} X_{i}]\)

Furthermore we can take out Σ - property 2
\(\mathbb{E}[X̄] ={1\over n }Σ\mathbb{E}[X_{i}]\)
The expected value of an item in a sample is equal to the population mean - property 1
\(\mathbb{E}[X̄] ={1\over n }Σμ\)
\(\mathbb{E}[X̄] =μ\)

Here we can see that the expected value of sample mean is the population mean.

Given the values below, find the best point estimate for the population mean μ.
7.61 ,7.17 ,9.06 ,6.305,7.805 ,7.11, 9.705 ,6.11,8.56 ,7.11 ,6.455 ,9.06

\( X̄={Σ_{i} X_{i}\over n }\)
\( X̄={7.61+7.17+9.06+6.305+7.805+7.11+9.705+6.11+8.56+7.11+6.455+9.06\over 12 }\)
\( X̄={92.06\over 12 }\)
\( X̄=7.67\)
The best point estimate for the population mean μ is 7.67

Sample variance as an estimator of the population variance

\( S^{2}={Σ_{i} (X_{i}-X̄)^2\over n-1 }\)

\(\mathbb{E}[S^{2}] = \mathbb{E}[{Σ_{i} (X_{i}-X̄)^2\over n-1 }]\)

Let us use the property \( \mathbb{E}[aY]= a\mathbb{E}[Y]\)

\(\mathbb{E}[S^{2}] = {1\over n-1}\mathbb{E}[Σ_{i} (X_{i}-X̄)^2]\)

You can use a simple trick that -μ+μ = 0

\(\mathbb{E}[S^{2}] = {1\over n-1}\mathbb{E}[Σ_{i} (X_{i}-μ+μ-X̄)^2]\)

Let us expand the brackets

\(\mathbb{E}[S^{2}] = {1\over n-1}\mathbb{E}[Σ_{i} (X_{i}-μ)^2-2(X_{i}-μ)(X̄-μ)+(X̄-μ)^2]\)

Next you can use the property \(Σ_{i} r_{i} + p_{i} = Σ_{i} r_{i} + Σ_{i}p_{i}\)

\(\mathbb{E}[S^{2}] = {1\over n-1}\mathbb{E}[Σ_{i} (X_{i}-μ)^2-2Σ_{i}(X_{i}-μ)(X̄-μ)+Σ_{i}(X̄-μ)^2]\)

For the middle term we will divide by n and multiply by n.

\(\mathbb{E}[S^{2}] = {1\over n-1}\mathbb{E}[Σ_{i} (X_{i}-μ)^2-2nΣ_{i}({X_{i}-μ\over n})(X̄-μ)+Σ_{i}(X̄-μ)^2]\)

Let us remember that \(Σ_{i} C = nC\) where C is a constant.

\(\mathbb{E}[S^{2}] = {1\over n-1} [Σ_{i}\mathbb{E}[(X_{i}-μ)^2]-2n\mathbb{E}[(X̄-μ)^{2}]+Σ_{i}\mathbb{E}[(X̄-μ)^2]\)

Now you can express the equation in terms of variance,

\(\mathbb{E}[S^{2}] = {1\over n-1} [Σ_{i}\mathbb{VAR}(X_{i})-2n\mathbb{VAR}(X̄)+n\mathbb{VAR}(X̄)\)

Here we need to use the property that \(\mathbb{VAR}(X̄) = {\mathbb{VAR}(X_{i})\over n}\)

\(\mathbb{E}[S^{2}] = {1\over n-1}[Σ_{i} \mathbb{VAR}(X_{i})-2n\mathbb{VAR}(X_{i})+n{\mathbb{VAR}(X_{i})\over n}]\)

Let us rewrite \(\mathbb{VAR}(X_{i})\) as \(σ^2\)

\(\mathbb{E}[S^{2}] = {1\over n-1}[nσ^{2}-2n{σ^{2}\over n}+σ^{2}]\)

Finally, we just need to simplify to show that:

\(\mathbb{E}[S^{2}] = σ^2\)

We have shown that the expected value of the sample variance is equal to the variance. This is interesting because when we normally calculate the population variance we divided by n. However for the sample so that you can get our best estimate of the population variance, you need to divide by n- 1.
Example

Given the values below, find the best point estimate for the population variance σ².
7.61 ,7.17 ,9.06 ,6.305,7.805 ,7.11, 9.705 ,6.11,8.56 ,7.11 ,6.455 ,9.06

A table is an effective way to find the sample variance. If you are using Excel or any other spreadsheet you can find the variance in a few seconds.

X	(X_i-X̄)²
7.61 ,	(7.61- 7.67)²
7.17	(7.17- 7.67)²
9.06	(9.06- 7.67)²
6.305	(6.305- 7.67)²
7.805	(7.805- 7.67)²
7.11	(7.11- 7.67)²
9.705	(9705- 7.67)²
6.11	(6.11- 7.67)²
8.56	(8.56- 7.67)²
7.11	(7.11- 7.67)²
6.455	(6.455- 7.67)²
9.06	(9.06- 7.67)²
SUM	16.439275

\( S^{2}={Σ_{i} (X_{i}-X̄)^2\over n-1 }\)

n is 12 so n- 1 = 11

\( S^{2}={16.439275 \over 11 }\)

Our estimate for the population variance is 1.494479545.
We can also estimate the population standard deviation by \( \sqrt{1.494479545}\) = 1.222489078.
Please note that your estimate of the your population variance will always be positive.

Estimating the proportion

The population proportion can be estimated by dividing the number of successes (x) by the sample size (n). This can be expressed as:

\( \hat{p} = {X\over n}\).

X is the number of successes and n is the sample size. X is considered to be binomial random variable with an expected value np.

\(\mathbb{E}[\hat{p}] =\mathbb{E}[{X\over n }]\)

You will need to take the constant out.

\(\mathbb{E}[\hat{p}] ={1\over n} \mathbb{E}[X] = {np\over n} = p\)

Hence, you can see that \(\hat{p}\) is an unbiased estimation of p.

Example - sample of teacher trainees
A survey was conducted using a sample of 300 teacher trainees in a training school to determine what proportion of them view the services provided to them favorably. Out of 150 trainees, 103 of them responded that they viewed the services provided to them by the authorities as favorable. Find the point estimation for this data.

X = 103 and n = 150.

\( \hat{p} = {X\over n}\).

\( \hat{p} = {103\over 150}\) = 0.686 or 68.7%.

The researchers of this survey can establish the point estimate which is the population proportion as 0.686 or 68.7%.

Biased and Non-Biased Estimators - Key takeaways

The sample mean calculated using the formula \({Σ_{i} X_{i}\over n }\) is an unbiased estimator of the population mean
In order to estimate the population variance with our sample we should use the formula \({Σ_{i} (X_{i}-X̄)^2\over n-1 }\)
\({Σ_{i} (X_{i}-X̄)^2\over n}\) would be a biased estimator of the population variance
\(\hat{p}\) which is calculated by X number of successes divided by the total number is an unbiased estimation of the proportion p.