Have you asked yourself how statisticians determine parameters such as the mean age of an entire country's population? It is obvious that they can't get data from every single member of the population to calculate this statistic. However, they can gather data from small samples from the population, find their mean, and use that as a guide to guessing the parameter for the whole population. This is called point estimation.
Point estimation is the use of statistics to estimate the value of an unknown parameter of a population. . This contrasts with interval estimation where we are looking for a range that the true parameter will be inside.
Point estimation produces a single value by using sample data to calculate a single statistic that acts as the best estimate for an unknown population parameter. Here are some common examples of point estimates:
| The sample mean is a point estimate of the population mean μ |
| The standard deviation of a sample s is a point estimate of the standard deviation population σ. |
| The variance of a sample s2 is a point estimate for the variance of the population σ2. |
| Similarly, the sample proportion, p is a point estimate of the population proportion, p. |
We will explore 4 methods to estimate the unknown parameters: Best Unbiased Estimators: This method uses an estimator whose expected value is equal to the parameter. Maximum Likelihood Estimation: this method identifies values for the parameters of a model. The essence of the parameter values is to optimize the likelihood that the procedure outlined by the model generated the data that was truly observed. The Method of Moments This method use the moments of a distribution to estimate the parameters. Bayes Estimators: This method uses previous data to estimate an unknown parameter with the goal of minimizing the error between the estimator and the actual value of the parameter.
\( X̄={Σ_{i} X_{i}\over n }\)
Let us firstly remind ourselves of some key properties of expected values:
\(\mathbb{E}[X_{1}+X_{2} + ... + X_{n}] = \mathbb{E}[X_{1}]+\mathbb{E}[X_{2}] + ... + \mathbb{E}[X_{n}]\)
This is true for independent events.
This can be expressed as:
\(\mathbb{E}[Σ_{i} X_{i}] = Σ_{i} \mathbb{E}[X_{i}]\)
\( \mathbb{E}[aY] = a\mathbb{E}[Y]\)
where a is a constant and Y is a random variable.
\(\mathbb{E}[X̄] = \mathbb{E}[{Σ_{i} X_{i}\over n }]\)
\(\mathbb{E}[X̄] ={1\over n } \mathbb{E}[Σ_{i} X_{i}]\)
\(\mathbb{E}[X̄] ={1\over n }Σ\mathbb{E}[X_{i}]\)
\(\mathbb{E}[X̄] ={1\over n }Σμ\)
\(\mathbb{E}[X̄] =μ\)
Example
Given the values below,find the best point estimate for the population mean μ.
7.61 ,7.17 ,9.06 ,6.305,7.805 ,7.11, 9.705 ,6.11,8.56 ,7.11 ,6.455 ,9.06
Solution
Best point estimate for μ
\( X̄={Σ_{i} X_{i}\over n }\)
\( X̄={7.61+7.17+9.06+6.305+7.805+7.11+9.705+6.11+8.56+7.11+6.455+9.06\over 12 }\)
\( X̄={92.06\over 12 }\)
\( X̄=7.67\)
The best point estimate for the population mean μ is 7.67
| x | (Xi-X̄)2 |
| 7.61 , | (7.61- 7.67)2 |
| 7.17 | (7.17- 7.67)2 |
| 9.06 | (9.06- 7.67)2 |
| 6.305 | (6.305- 7.67)2 |
| 7.805 | (7.805- 7.67)2 |
| 7.11 | (7.11- 7.67)2 |
| 9.705 | (9705- 7.67)2 |
| 6.11 | (6.11- 7.67)2 |
| 8.56 | (8.56- 7.67)2 |
| 7.11 | (7.11- 7.67)2 |
| 6.455 | (6.455- 7.67)2 |
| 9.06 | (9.06- 7.67)2 |
| SUM | 16.439275 |
The population proportion can be estimated by dividing the number of successes (x) by the sample size (n). This can be expressed as: \( p̂ = {x\over n}\). x is the number of successes and n is the sample size. Example - sample of teacher trainees A survey was conducted using a sample of 300 teacher trainees in a training school to determine what proportion of them view the services provided to them favorably. Out of 150 trainees, 103 of them responded that they viewed the services provided to them by the authorities as favorable. Find the point estimation for this data.
x = 103 and n = 150. \( p̂ = {x\over n}\). \( p̂ = {103\over 150}\) = 0.686 or 68.7%. The researchers of this survey can establish the point estimate which is the population proportion as 0.686 or 68.7%. Point estimation is the form of statistical inference in which we estimate the unknown parameter of interest using a single value based on the sample data (hence the name point estimate).The likelihood of a function can be expressed as the following: L = \( f(x_{1},x_{2},x_{3},...x_{n}|θ)\) Since these events are all independent L can be expressed as: L = \( f(x_{1}|θ)f(x_{2}|θ)f(x_{3}|θ)..f(x_{n}|θ) \) To find the maximum we need to solve \({dL\over dθ} = 0 \) \( {1\over L}{dL\over dθ} = 0 \) \({d\log(L)\over dθ} = 0 \) Example -flipping an unfair coin An unfair coin flipped 100 and 61 heads are observed. What is the MLE when nothing is previously know about the the coin? P(H = 61|p) We want to find the value of p so that probability becomes a maximum. \({dP(H = 61|p)\over dp} = 0\). \(^{100}C_{61}p^{61}(1-p)^{39}\) \(\log^{100}C_{61} + 61\log p + 39\log(1-p) \) After differentiation \(61 {1\over p} -39{1\over 1-p} = 0 \) 61(1-p) -39p = 0 p = \({61\over 100}\)
The Bernouilli distribution provides the probability of an event happening \((x_{1} = 1)\) or not happending \((x_{0} = 0)\) such as a coin landing on its Head. \( f(x_{i} |θ)= p^{x_{i}}(1-p)^{1- x_{i}}\) L = \( f(x_{1}|θ)f(x_{2}|θ)f(x_{3}|θ)..f(x_{n}|θ) \) L = \(p^{x_{1}}(1-p)^{1-x_{1}}p^{x_{2}}(1-p)^{1-x_{2}}..p^{x_{i}}(1-p)^{1-x_{n}}\) \(L = p^{\Sigma_{i}x_{i}}(1-p)^{n-\Sigma_{i}x_{i}}\) Using properties of logarithms we can express \(\log(L)\) as the following: \(\log(L) =\log ^{n}C_{p} + \Sigma_{i}x_{i} \log(p) + (n-\Sigma_{i}x_{i})\log(1-p)\) Now we need to differentiate to find the parameter that maximises the likelihood. \({d\log(L)\over dp} = {\Sigma_{i}x_{i}\over p}-{n-\Sigma_{i}x_{i}\over (1-p)} \) \((1-p)\Sigma_{i}x_{i} - p[n-\Sigma_{i}x_{i}] = 0\) \(\Sigma_{i}x_{i} - pn = 0\) p = \( {\Sigma_{i}x_{i}\over n}\) You may be aware that the mean of the Bernouilli distribution is p. Here we can see that to estimate the population mean of a Bernouilli distribution we need the sample mean.
The bernouilli distribution provides the probability of an \(x_{i}\) of events occurring such as 10 out of 20 coins showing a Head facing up. \( f(x_{i} |θ)=^{n}C_{x_{i}}p^{x_{i}}(1-p)^{n- x_{i}}\) L = \(f(x_{1}|θ)f(x_{2}|θ)f(x_{3}|θ)..f(x_{n}|θ) \) L = \([^{n}C_{x_{1}}p^{x_{1}}(1-p)^{n-x_{1}}][^{n}C_{x_{2}}p^{x_{2}}(1-p)^{n-x_{2}}]..[^{n}C_{x_{n}}p^{x_{n}}(1-p)^{n-x_{n}}]\) \(L = [^{n}C_{x_{1}}][^{n}C_{x_{2}}]..[^{n}C_{x_{n}}][p^{\Sigma_{i}x_{i}}](1-p)^{n^{2}-\Sigma_{i}x_{i}}\) Using properties of logarithms we can express \(\log(L)\) as the following: \(\log(L) =\log[[^{n}C_{1}][^{n}C_{2}]]... + \Sigma_{i}x_{i} \log(p) + (n^{2}-\Sigma_{i}x_{i})\log(1-p)\) Now we need to differentiate to find the parameter that maximises the likelihood. \({d\log(L)\over dp} = {\Sigma_{i}x_{i}\over p}-{n^{2}-\Sigma_{i}x_{i}\over (1-p)} \) \((1-p)\Sigma_{i}x_{i} - p[n^2-\Sigma_{i}x_{i}] = 0\) \(\Sigma_{i}x_{i} - pn^2 = 0\) np = \( {\Sigma_{i}x_{i}\over n}\) Here we can see that to estimate the population mean of a Binomial distribution we need the sample mean.
We can use moments to estimate parameters but firstly let remind ourselves of moment generating functions.
\(e^{t}\) can be expressed as a polynomial using Maclaurin's series.
\(e^{t} = 1 + t + {t^{2} \over 2!} + {t^{3} \over 3} +.... = 0\)
\(M_{X}(t) = E[e^{tX}]\)
\(M_{X}(t) = E[1 + (tX) + {(tX)^{2} \over 2!} + {(tX)^{3} \over 3} + .... ]\)
We can show that the
\(E(X) = M'_{X}(0)\)
\(E(X^2) = M''_{X}(0)\)
\( Var(X) = E(X^2) - [E(X)]^{2} = M'_{X}(0) - (M''_{X}(0))^{2}\)
E[X] is the first first moment \(μ_{1}\)
We can estimate using the sample
\( μ̂_{1} = {\Sigma X_{i} \over n} \)
Similarly \(E[X^{k}] \) is the first kth moment \(μ_{k}\)
Using the estimate we can estimate the kth moment.
\( μ̂_{k} = {\Sigma X_{i}^{k} \over n} \)
Example- Gamma distribution (a, λ)
\( μ_{1} = {a \over λ} \)
\(μ_{2} = {a \over λ^{2}} + {a^{2} \over λ^{2}} \)
solving to find a and λ
\(a = {μ_{1}^2 \over μ_{2}-μ_{1}^2 }\)
\(λ = {μ_{1} \over μ_{2}-μ_{1}^2 } \)
Since we may not know the true moments we needs to use our estimators \( μ̂_{1}\) and \( μ̂_{2}\) to find the parameters.
\(a = {μ̂_{1}^2 \over μ̂_{2}-μ̂_{1}^2 }\)
\(λ = {μ̂_{1} \over μ̂_{2}-μ_{1}̂^2 } \)
Bayes Estimator is a method for finding the parameters when we are unsure which distribution we are are using.
Example- 3 coins with different levels of fairness
Suppose I have three coins in my wallet: