Have you asked yourself how statisticians determine parameters such as the mean age of an entire country's population? It is obvious that they can't get data from every single member of the population to calculate this statistic. However, they can gather data from small samples from the population, find their mean, and use that as a guide to guessing the parameter for the whole population. This is called point estimation.
Point estimation is the use of statistics to estimate the value of an unknown parameter of a population. . This contrasts with interval estimation where we are looking for a range that the true parameter will be inside.

Formulas for Point Estimation

Point estimation produces a single value by using sample data to calculate a single statistic that acts as the best estimate for an unknown population parameter.
Here are some common examples of point estimates:

The sample mean is a point estimate of the population mean μ

The standard deviation of a sample s is a point estimate of the standard deviation population σ.

The variance of a sample s² is a point estimate for the variance of the population σ².

Similarly, the sample proportion, p is a point estimate of the population proportion, p.

We will explore 4 methods to estimate the unknown parameters:
Best Unbiased Estimators: This method uses an estimator whose expected value is equal to the parameter.
Maximum Likelihood Estimation: this method identifies values for the parameters of a model. The essence of the parameter values is to optimize the likelihood that the procedure outlined by the model generated the data that was truly observed.
The Method of Moments This method use the moments of a distribution to estimate the parameters.
Bayes Estimators: This method uses previous data to estimate an unknown parameter with the goal of minimizing the error between the estimator and the actual value of the parameter.

Best unbiased estimator

Bias = E(̂θ)- E(θ)
The best unbiased estimators means that
E(̂θ) = E(θ)

Sample mean as an estimator of the population mean

\( X̄={Σ_{i} X_{i}\over n }\)
Let us firstly remind ourselves of some key properties of expected values:
\(\mathbb{E}[X_{1}+X_{2} + ... + X_{n}] = \mathbb{E}[X_{1}]+\mathbb{E}[X_{2}] + ... + \mathbb{E}[X_{n}]\)
This is true for independent events.
This can be expressed as:
\(\mathbb{E}[Σ_{i} X_{i}] = Σ_{i} \mathbb{E}[X_{i}]\)
\( \mathbb{E}[aY] = a\mathbb{E}[Y]\)
where a is a constant and Y is a random variable.
\(\mathbb{E}[X̄] = \mathbb{E}[{Σ_{i} X_{i}\over n }]\)
\(\mathbb{E}[X̄] ={1\over n } \mathbb{E}[Σ_{i} X_{i}]\)
\(\mathbb{E}[X̄] ={1\over n }Σ\mathbb{E}[X_{i}]\)
\(\mathbb{E}[X̄] ={1\over n }Σμ\)
\(\mathbb{E}[X̄] =μ\)
Example
Given the values below,find the best point estimate for the population mean μ.
7.61 ,7.17 ,9.06 ,6.305,7.805 ,7.11, 9.705 ,6.11,8.56 ,7.11 ,6.455 ,9.06
Solution
Best point estimate for μ
\( X̄={Σ_{i} X_{i}\over n }\)
\( X̄={7.61+7.17+9.06+6.305+7.805+7.11+9.705+6.11+8.56+7.11+6.455+9.06\over 12 }\)
\( X̄={92.06\over 12 }\)
\( X̄=7.67\)
The best point estimate for the population mean μ is 7.67

Sample variance as an estimator of the population variance

\( S^{2}={Σ_{i} (X_{i}-X̄)^2\over n-1 }\)
\(\mathbb{E}[S^{2}] = \mathbb{E}[{Σ_{i} (X_{i}-X̄)^2\over n-1 }\)
Let us use the property \( \mathbb{E}[aY]= a\mathbb{E}[Y]\)
\(\mathbb{E}[S^{2}] = {1\over n-1}\mathbb{E}[Σ_{i} (X_{i}-X̄)^2\)
\(\mathbb{E}[S^{2}] = {1\over n-1}\mathbb{E}[Σ_{i} (X_{i}-μ+μ-X̄)^2\)
Let us expand the brackets
\(\mathbb{E}[S^{2}] = {1\over n-1}\mathbb{E}[Σ_{i} (X_{i}-μ)^2-2(X_{i}-μ)(X̄-μ)+(X̄-μ)^2]\)
\(\mathbb{E}[S^{2}] = {1\over n-1}\mathbb{E}[Σ_{i} (X_{i}-μ)^2-2Σ_{i}(X_{i}-μ)(X̄-μ)+Σ_{i}(X̄-μ)^2]\)
For the middle term we will divide by n and multiply by n.
\(\mathbb{E}[S^{2}] = {1\over n-1}\mathbb{E}[Σ_{i} (X_{i}-μ)^2-2nΣ_{i}({X_{i}-μ\over n})(X̄-μ)+Σ_{i}(X̄-μ)^2]\)
Let us remember that \(Σ_{i} C = nC\) where C is a constant. \(\mathbb{E}[S^{2}] = {1\over n-1} [Σ_{i}\mathbb{E}[(X_{i}-μ)^2]-2n\mathbb{E}[(X̄-μ)^{2}]+Σ_{i}\mathbb{E}[(X̄-μ)^2]\)
\(\mathbb{E}[S^{2}] = {1\over n-1} [Σ_{i}\mathbb{VAR}(X_{i})-2n\mathbb{VAR}(X̄)+n\mathbb{VAR}(X̄)\)
Here we need to use the property that \(\mathbb{VAR}(X̄) = {\mathbb{VAR}(X_{i})\over n}\)
\(\mathbb{E}[S^{2}] = {1\over n-1}[Σ_{i} \mathbb{VAR}(X_{i})-2n\mathbb{VAR}(X_{i})+n{\mathbb{VAR}(X_{i})\over n}]\)
Let us rewrite \(\mathbb{VAR}(X_{i})\) as \(σ^2\)
\(\mathbb{E}[S^{2}] = {1\over n-1}[nσ^{2}-2n{σ^{2}\over n}+σ^{2}]\)
\(\mathbb{E}[S^{2}] = {1\over n-1}[nσ^{2}-2σ^2+σ^{2}]\)
\(\mathbb{E}[S^{2}] = {1\over n-1}[nσ^{2}-σ^{2}]\)
\(\mathbb{E}[S^{2}] = σ^2\)
Example Given the values below,find the best point estimate for the population variance σ².
7.61 ,7.17 ,9.06 ,6.305,7.805 ,7.11, 9.705 ,6.11,8.56 ,7.11 ,6.455 ,9.06

A table is an effective way to find the sample variance. If you are using Excel or any other spreadsheet you can find the variance in a few seconds.

x	(X_i-X̄)²
7.61 ,	(7.61- 7.67)²
7.17	(7.17- 7.67)²
9.06	(9.06- 7.67)²
6.305	(6.305- 7.67)²
7.805	(7.805- 7.67)²
7.11	(7.11- 7.67)²
9.705	(9705- 7.67)²
6.11	(6.11- 7.67)²
8.56	(8.56- 7.67)²
7.11	(7.11- 7.67)²
6.455	(6.455- 7.67)²
9.06	(9.06- 7.67)²
SUM	16.439275

\( S^{2}={Σ_{i} (X_{i}-X̄)^2\over n-1 }\)
n is 12 so n- 1 = 11
\( S^{2}={16.439275 \over 11 }\)
Our estimate for the population variqnce is 1.494479545.
We can also estimate the population standard deviation by \( \sqrt{1.494479545}\) = 1.222489078.
Please note that your estimate of the your population variance will always be positive.

Estimating the proportion

The population proportion can be estimated by dividing the number of successes (x) by the sample size (n). This can be expressed as:
\( p̂ = {x\over n}\).
x is the number of successes and n is the sample size.
Example - sample of teacher trainees
A survey was conducted using a sample of 300 teacher trainees in a training school to determine what proportion of them view the services provided to them favorably. Out of 150 trainees, 103 of them responded that they viewed the services provided to them by the authorities as favorable. Find the point estimation for this data.

x = 103 and n = 150.
\( p̂ = {x\over n}\).
\( p̂ = {103\over 150}\) = 0.686 or 68.7%. The researchers of this survey can establish the point estimate which is the population proportion as 0.686 or 68.7%. Point estimation is the form of statistical inference in which we estimate the unknown parameter of interest using a single value based on the sample data (hence the name point estimate).

Maximum Likelihood Estimation

The likelihood of a function can be expressed as the following: L = \( f(x_{1},x_{2},x_{3},...x_{n}|θ)\)
Since these events are all independent L can be expressed as:
L = \( f(x_{1}|θ)f(x_{2}|θ)f(x_{3}|θ)..f(x_{n}|θ) \)
To find the maximum we need to solve
\({dL\over dθ} = 0 \)
\( {1\over L}{dL\over dθ} = 0 \)
\({d\log(L)\over dθ} = 0 \)
Example -flipping an unfair coin
An unfair coin flipped 100 and 61 heads are observed. What is the MLE when nothing is previously know about the the coin?
P(H = 61|p)
We want to find the value of p so that probability becomes a maximum.
\({dP(H = 61|p)\over dp} = 0\).
\(^{100}C_{61}p^{61}(1-p)^{39}\)
\(\log^{100}C_{61} + 61\log p + 39\log(1-p) \)
After differentiation
\(61 {1\over p} -39{1\over 1-p} = 0 \)
61(1-p) -39p = 0
p = \({61\over 100}\)

Example - Bernouilli random variables

The Bernouilli distribution provides the probability of an event happening \((x_{1} = 1)\) or not happending \((x_{0} = 0)\) such as a coin landing on its Head.
\( f(x_{i} |θ)= p^{x_{i}}(1-p)^{1- x_{i}}\)
L = \( f(x_{1}|θ)f(x_{2}|θ)f(x_{3}|θ)..f(x_{n}|θ) \)
L = \(p^{x_{1}}(1-p)^{1-x_{1}}p^{x_{2}}(1-p)^{1-x_{2}}..p^{x_{i}}(1-p)^{1-x_{n}}\)
\(L = p^{\Sigma_{i}x_{i}}(1-p)^{n-\Sigma_{i}x_{i}}\)
Using properties of logarithms we can express \(\log(L)\) as the following:
\(\log(L) =\log ^{n}C_{p} + \Sigma_{i}x_{i} \log(p) + (n-\Sigma_{i}x_{i})\log(1-p)\)
Now we need to differentiate to find the parameter that maximises the likelihood.
\({d\log(L)\over dp} = {\Sigma_{i}x_{i}\over p}-{n-\Sigma_{i}x_{i}\over (1-p)} \)
\((1-p)\Sigma_{i}x_{i} - p[n-\Sigma_{i}x_{i}] = 0\)
\(\Sigma_{i}x_{i} - pn = 0\)
p = \( {\Sigma_{i}x_{i}\over n}\)
You may be aware that the mean of the Bernouilli distribution is p. Here we can see that to estimate the population mean of a Bernouilli distribution we need the sample mean.

Example - Binomial random variables

The bernouilli distribution provides the probability of an \(x_{i}\) of events occurring such as 10 out of 20 coins showing a Head facing up.
\( f(x_{i} |θ)=^{n}C_{x_{i}}p^{x_{i}}(1-p)^{n- x_{i}}\)
L = \(f(x_{1}|θ)f(x_{2}|θ)f(x_{3}|θ)..f(x_{n}|θ) \)
L = \([^{n}C_{x_{1}}p^{x_{1}}(1-p)^{n-x_{1}}][^{n}C_{x_{2}}p^{x_{2}}(1-p)^{n-x_{2}}]..[^{n}C_{x_{n}}p^{x_{n}}(1-p)^{n-x_{n}}]\)
\(L = [^{n}C_{x_{1}}][^{n}C_{x_{2}}]..[^{n}C_{x_{n}}][p^{\Sigma_{i}x_{i}}](1-p)^{n^{2}-\Sigma_{i}x_{i}}\)
Using properties of logarithms we can express \(\log(L)\) as the following:
\(\log(L) =\log[[^{n}C_{1}][^{n}C_{2}]]... + \Sigma_{i}x_{i} \log(p) + (n^{2}-\Sigma_{i}x_{i})\log(1-p)\)
Now we need to differentiate to find the parameter that maximises the likelihood.
\({d\log(L)\over dp} = {\Sigma_{i}x_{i}\over p}-{n^{2}-\Sigma_{i}x_{i}\over (1-p)} \)
\((1-p)\Sigma_{i}x_{i} - p[n^2-\Sigma_{i}x_{i}] = 0\)
\(\Sigma_{i}x_{i} - pn^2 = 0\)
np = \( {\Sigma_{i}x_{i}\over n}\)
Here we can see that to estimate the population mean of a Binomial distribution we need the sample mean.

Moments to estimate parameters

We can use moments to estimate parameters but firstly let remind ourselves of moment generating functions.
\(e^{t}\) can be expressed as a polynomial using Maclaurin's series.
\(e^{t} = 1 + t + {t^{2} \over 2!} + {t^{3} \over 3} +.... = 0\)
\(M_{X}(t) = E[e^{tX}]\)
\(M_{X}(t) = E[1 + (tX) + {(tX)^{2} \over 2!} + {(tX)^{3} \over 3} + .... ]\)
We can show that the
\(E(X) = M'_{X}(0)\)
\(E(X^2) = M''_{X}(0)\)
\( Var(X) = E(X^2) - [E(X)]^{2} = M'_{X}(0) - (M''_{X}(0))^{2}\)
E[X] is the first first moment \(μ_{1}\)
We can estimate using the sample
\( μ̂_{1} = {\Sigma X_{i} \over n} \)
Similarly \(E[X^{k}] \) is the first kth moment \(μ_{k}\)
Using the estimate we can estimate the kth moment.
\( μ̂_{k} = {\Sigma X_{i}^{k} \over n} \)
Example- Gamma distribution (a, λ)
\( μ_{1} = {a \over λ} \)
\(μ_{2} = {a \over λ^{2}} + {a^{2} \over λ^{2}} \)
solving to find a and λ
\(a = {μ_{1}^2 \over μ_{2}-μ_{1}^2 }\)
\(λ = {μ_{1} \over μ_{2}-μ_{1}^2 } \)
Since we may not know the true moments we needs to use our estimators \( μ̂_{1}\) and \( μ̂_{2}\) to find the parameters.
\(a = {μ̂_{1}^2 \over μ̂_{2}-μ̂_{1}^2 }\)
\(λ = {μ̂_{1} \over μ̂_{2}-μ_{1}̂^2 } \)

Bayes Estimator

Bayes Estimator is a method for finding the parameters when we are unsure which distribution we are are using.
Example- 3 coins with different levels of fairness
Suppose I have three coins in my wallet:

a biased coin 3:1 in favour of tails
a fair coin
a biased coin 3:1 in favour of heads

Let x = 1 denote the event that I observe a head, X = 0 if a tail
θ denote the probability of a head: (0.25, 0.5, 0.75)
Prior: p(θ = 0.25) = p(θ = 0.5) = p(θ = 0.75) = 0.33
Probability mass function: \(p(x|θ) = θ^{x}(1-θ)^{1-x} \) \[ \begin{array}{|c|c|c|} \hline \text{coin} & θ & p(θ) & p(X = 1|θ) & p(X = 1|θ)p(θ) &({p(X = 1|θ)p(θ)\over P(X=1)} \\ \hline 1 & 0.25 & 0.33 & 0.25 & (0.33)(0.25)=0.0825 & {(0.33)(0.25)\over 0.5}=0.167 \\ \hline 2 & 0.5 & 0.33 & 0.50 & (0.33)(0.50)=0.1650 & {(0.33)(0.50)\over 0.5}=0.333 \\ \hline 3 & 0.75 & 0.33 & 0.75 & (0.33)(0.75)=0.2475 & {(0.33)(0.75)\over 0.5} =0.5 \\ \hline & SUM & 1 & 0.5 & 0.495 & 1 \\ \hline \end{array} \] \(p(X = 1) = (p(x = 1 ∩ θ_{1})+(p(x = 1 ∩ θ_{2})+(p(x = 1 ∩ θ_{3})\)
Alternatively using conditional probability.
\(p(X = 1) = Σ(p(X = 1|θ_{i})(p(θ_{i})\)
\(p(X=1) = p(X = 1|θ_{1})(p(θ_{1})+p(X = 1|θ_{2})(p(θ_{2})+p(X = 1|θ_{3})(p(θ_{3})\)
\(p(X=1) = (0.25){1\over 3}+(0.5){1\over 3}+(0.75){1\over 3} = 0.5\)
We can see that given the above scenario, the probability that x = 1 is 0.167.
Example 2 - the number of customers entering Starbucks
It is believed that the number of customers that enter Starbuck coffee house has a mean rate equal to either 3 or 5 for a given time interval. This is a poisson distribution. Prior to collecting any data, a Starbuck's data collector believes that it is much more likely that the rate λ = 3 than λ = 5 . In fact, the engineer believes that the prior probabilities are: P(λ = 3 ) = 0.7 and P(λ = 5 ) = 0.3
One day, during a a randomly selected time interval, the data collector observes x = 7 customers enter Starbucks. In light of the data collector's observation, what is the probability that λ = 3. And what is the probability that λ = 7, \[ \begin{array}{|c|c|c|} \hline & λ & p(λ) & p(X = 7|λ) & p(X = 7|λ)p(λ) & {p(X = 7|λ)p(λ)\over P(X=7)} \\ \hline 1 & 3 & 0.7 & 0.022 & 0.022(0.7) & {0.022(0.7)\over 0.0469} = 0.328358209 \\ \hline 2 & 5 & 0.3 & 0.105 & 0.105(0.3) & {0.105(0.3)\over 0.0469}= 0.671641791 \\ \hline & SUM &1 & 0.127 & 0.0469 &1 \\ \hline \end{array} \] \(p(X=7) = P(X = 7 ∩ λ = 3) +p(X = 7 ∩ λ = 5)\)
Alternatively using conditional probability.
\(p(X=7) = p(X = 7|λ = 3)(p(λ = 3)+p(X = 7|λ = 5)(p(λ = 5)\)
p(X=7) = 0.022(0.7)+ 0.105(0.3) = 0.0469
Given the observation, the p(λ = 3) is 0.33.

Point Estimation - Key takeaways

Point estimation is the use of a statistic to estimate the value of an unknown parameter of a population.
There are two main types of estimation in statistics. Point estimation and interval estimation.
Methods of point estimation are:
The Method of Moments
Maximum Likelihood Estimation
Bayes Estimators
Best Unbiased Estimators
The purposes of Point Estimation are: Most statistical calculations are based on point estimates. â€œParameters of Point estimates are used in significance testing formulas. â€œPoint estimate produces a single value that acts as a "best" estimate for an unknown population parameter