You have been assigned a task to find the percentage of the population that is smaller than you. You have your travel around the world ticket and set on an adventure to visit every country and collect everyone's height. You discover that the average height is 5 foot 9 and even manage to calculate the population variance. You are able to draw a bell-shaped curve labeling the key parameters of this normal distribution curve. Unfortunately, you do not have any cumulative tables to find the percentage of the population that is smaller than you. However, at the back of your AP statistics book, you spot the cumulative probability tables for the z-score. All you need to do is simply transform the continuous variable height to the standard normal distribution variable and you will be able to find the proportion of the population that is smaller than you.

Introduction of the Z-Score

The z-score indicates how much a given value differs from the standard deviation. The z-score, or standard score, is the number of standard deviations a given data point lies above or below the mean. Standard deviation is essentially a reflection of the amount of variability within a given data set. \(z =  {X - \mu \over \sigma}\). X is a normally distributed random variable. The population mean \(\mu\) is the average of all values in the given population. The population standard deviation \(\sigma\) is a measure of the deviation from the population mean. The further away the data is from the population mean, the larger the population standard deviation will be. The z-score is a special normal distribution which is centered at 0 and its variance is 1.

Probability Distribution of the Z-Score

The z-score is a continuous random variable which has the following probability distribution.

\( f(x) = {1 \over σ \sqrt{2π}} e^{-{1 \over 2}{(x-μ)^{2}\over σ^{2}}}\)

The Mean of the Z-Score

\(\mathbb{E}[Z] = \mathbb{E}[{x - μ \over σ}]\)

Using the properties of expected values

\(\mathbb{E}[X+b] = \mathbb{E}[X] + b\)

\(\mathbb{E}[aX] =a\mathbb{E}[X]\)

\(\mathbb{E}[Z] = {1 \over \sigma}\mathbb{E}[X - μ] = {1 \over σ}(\mathbb{E}[X] - μ)\)

The population mean \(\mu = \mathbb{E}[X]\), Hence

\(\mathbb{E}[Z] = 0\)

The expected value of the z-score is located at 0.

The Variance of the Z-Score

Variance is a measure of the deviation from the population mean. It is the standard deviation σ squared.

\(\mathbb{VAR}[Z] = \mathbb{VAR}[{x - μ \over σ}\) ]

Using the properties of the variance:

\(\mathbb{VAR}[X+b] = \mathbb{VAR}[X]\)

\(\mathbb{VAR}[aX] =a^{2}\mathbb{VAR}[X]\)

\(\mathbb{VAR}[Z] = {1 \over σ^{2}}\mathbb{VAR}[X - \mu] = {1 \over \sigma^{2}}(\mathbb{VAR}[X])\)

The population variance \(\sigma^{2} = \mathbb{VAR}[X]\), hence

\(\mathbb{VAR}[Z] = 1\)

The variance of the z-score is 1.

The population standard deviation σ is also equal to 1.

Cumulative Probability Distribution


The cumulative probability density function

F(z) = \( \int_{-∞}^{z} {1 \over σ \sqrt{2π}} e^{-{1 \over 2}{(t-μ)^{2}\over σ^{2}}} dt = {1 \over 2}(1+erf({x-μ \over σ\sqrt{2}}))\)

This integration is beyond the scope of the course. Fortunately, we can use standard normal (z-score) distribution tables in order to compute F(x).




Normal Distribution Statistical Tables to Find Probabilities

You can use z-score tables to find the shaded region underneath the curve

cumulative probability of the z-scorestandard normal distribution graph



Normal distribution tables can be used to find P(Z< a). This is alternatively expressed as F(a) or Φ(a)

Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990

 

The cumulative probability of the random variable Z is the probability that the random variable will take a value less than or equal to z It can be seen from the graph that:

P(Z≤0) = 0.5

This should be evident since the population mean is 0.

P(Z≤4) = 1

If the entire region is shaded then the probability will be 1.

As z increases P(Z≤z) gets closer to 1.

From looking at the tables, using the rows and columns, we can spot that

P(Z≤1.96) = 0.975

Finding the Complement of the Cumulative Distribution Function

You can also use the tables to find P(Z≥a) = 1-P(Z≤a). For example,

P(Z≥1.96) = 1 - P(Z≤1.96) = 0.025

Finding the Probability of a Shaded Region

The next step is to be able to use the table to find the probability that the z-score is in a given range.

P(a≤Z≤b) = P(Z≤b) - P(≤a)

Find the probability P(0.44≤Z≤1.22) 

From our understanding of inequalities, we can express the probability in terms of two cumulative probability functions. P(0.44≤Z≤1.22)) = P(Z≤1.22)-P(Z≤0.44) = 0.8888-0.67;

The next stage is to use the tables

P(Z≤1.22)-P(Z≤0.44) = 0.8888-0.67 = 0.2188 ;

The last part is using the tables to find the P(Z≤-a) where a is a positive value.

Here we will need to use symmetry. It can be seen that P(Z≤-a) = P(Z>a) = 1- P(Z≤a). For example

P(Z≤-1.16) = P(Z>1.16) = 1-P(Z≤1.16) = 1 - 0.877 = 0.123;



Finding the Probability of a Random Normal Variable

Understanding the z-score, cumulative probability density functions and normal distribution tables are the necessary skills in order to find the probability from normal random variables. To deal with this task you need to know that the variable has a normal distribution and also know the population mean and population variance. You do not necessarily have the probabilities for this random variable but we can compute probabilities for the z-score. This process is called standardizing the random normal variable.


An average light bulb manufactured by the Acme Corporation lasts 300 days with a standard deviation of 60 days. Assuming that bulb life has a normal distribution, what is the probability that an Acme light bulb will last at most 360 days?

P(X≤360) = P(Z≤\( {360- 300; \over 60 }\) = P(Z≤1)

We are now ready to use the z-score normal distribution tables.

P(Z≤1) = 0.8413



Z-score for Sample Means

Unfortunately, it is not always practical or affordable to know the distribution of a random variable and also know its population mean and population variance. As a result, you will need to rely on samples to estimate probabilities. You may have a sample of n dice. \(\bar{X}\) is the mean of this sample. Central Limit Theorem states that when n is large then \(\bar{X}\) is said to be normally distributed. The sample size needs to be at least 30. You may also need to estimate the population mean (expected value) for a die and the population variance. When the sample size n is large you can use the normal distribution, otherwise, you will have to use the T distribution. The T distribution is similar to the normal distribution, just with fatter tails.


zscore - Key takeaways


Free Web Hosting