Z score - Business Definition
In statistics, a Z score represents the number of standard
deviations that a particular value of a variable is from the mean. The Z score
= [(specific value of a variable) - (mean of the variable)] ÷ (standard
deviation of the variable). A Z score indicates how different a specific value
is from the mean. Also called standard score, Z value.
Z test Statistics, Sample Calculator
Calculate the z score from the two
samples of population mean and standard deviation using z test statistics
calculator.
Z test
Statistics, Sample Calculation
|
|
|
|
|
|
|
|
|
Code to add this calci to
your website

Formula Used:
Z value = (X-µ) / σ
Where,
X = Standardized Random Variable,
µ = Population Mean,
σ = Population Standard Deviation.
We can also calculate the Z Score even when only the two samples population mean and standard deviation are known. The Z test statistics calculation is made easier.
Z value = (X-µ) / σ
Where,
X = Standardized Random Variable,
µ = Population Mean,
σ = Population Standard Deviation.
We can also calculate the Z Score even when only the two samples population mean and standard deviation are known. The Z test statistics calculation is made easier.
The Power of Z
A
common statistical way of standardizing data on one scale so a comparison can
take place is using a z-score. The z-score is like a common yard stick for all
types of data. Each z-score corresponds to a point in a normal distribution and
as such is sometimes called a normal deviate since a z-score will describe how
much a point deviates from a mean or specification point.
In Six Sigma parlance, z-score and process sigma are used interchangeably and are sometimes called z-equivelents. Strictly speaking, the process sigma and z-equivalents are loosely tied to the statistical z-score. The statistical z-score has very strict definitions derived from the rules of the normal distribution. For most applications in Six Sigma, ignoring some of those constraints is innocuous. In usability testing the benefit of the standardization from process sigmas allow us to meaningfully compare disparate measures like task completion and time on task.
The z-score/process sigma is calculated by subtracting your sample mean from a target data point and dividing by the target standard deviation. This value is a measure of the distance in standard deviations of a sample from the mean and is expressed using the Greek letter σ. If your sample is 3 standard deviations from the spec limit, you would describe your process as 3 sigma. or 3σ
The further away a sample is from the spec limit the higher the z-score and process sigma. A higher process sigma means a less defective process. The term Six Sigma originates from the z-score. 6σ means that six standard deviations lie between the mean of a sample and the nearest specification limit. To visualize the Z-score see the Interactive Graph of the Standard Normal Curve
Each process sigma has two equivalent values which provide a meaningful way to compare data and understand how defective a process is:
In Six Sigma parlance, z-score and process sigma are used interchangeably and are sometimes called z-equivelents. Strictly speaking, the process sigma and z-equivalents are loosely tied to the statistical z-score. The statistical z-score has very strict definitions derived from the rules of the normal distribution. For most applications in Six Sigma, ignoring some of those constraints is innocuous. In usability testing the benefit of the standardization from process sigmas allow us to meaningfully compare disparate measures like task completion and time on task.
The z-score/process sigma is calculated by subtracting your sample mean from a target data point and dividing by the target standard deviation. This value is a measure of the distance in standard deviations of a sample from the mean and is expressed using the Greek letter σ. If your sample is 3 standard deviations from the spec limit, you would describe your process as 3 sigma. or 3σ
The further away a sample is from the spec limit the higher the z-score and process sigma. A higher process sigma means a less defective process. The term Six Sigma originates from the z-score. 6σ means that six standard deviations lie between the mean of a sample and the nearest specification limit. To visualize the Z-score see the Interactive Graph of the Standard Normal Curve
Each process sigma has two equivalent values which provide a meaningful way to compare data and understand how defective a process is:
- DPMO: Each expresses the probability of a defect in terms
of a defect per million opportunities or DPMO. That is, if a condition
were to occur one million times, how many times out of that one million
would a defect occur? A process sigma of .5 is equal to 308,000 defects
per million opportunities. And a process sigma of 2.5 means that 6,210 out
of 1 million times there will be a defect. For a sample that is 6σ, the
DPMO is .0.001. Some organizations prefer to think in terms of defects per
opportunities instead of the more abstract "standard deviations above
the spec limit."
- Probability of a Defect: The process sigma can also be described in terms
of a probability of a defect. A z-score of .5 means there is a 30%
probability of encountering a defect. A z-score of .25 means there is a
40% probability of a defect. For a sample that is 6σ, the probability of a
defect is .0000001%. Note: Values do not include a 1.5σ shift.
Why use a Process Sigma?
The
process sigma is helpful in three ways:
- It allows you to compare disparate
types of data (seconds, which are a continuous measurement with task
completion which is binary with errors which are discrete count data)
- It provides you with a
probability of a defect
- You can meaningfully compare
two different products or processes:
- The process sigma for one
release of a software product can be compared to subsequent versions
- You can compare two different
products' process sigmas
- You can compare one module of
the same product with a different module on the same product
- You can use the properties of
the normal distribution to aide in assessing and improving your data set.
The z-score
The Standard Normal Distribution
|
Definition of the Standard Normal Distribution
The Standard Normal distribution follows
a normal distribution and has mean 0 and standard
deviation 1
|
Notice
that the distribution is perfectly symmetric about 0.
If
a distribution is normal but not standard, we can convert a value to the
Standard normal distribution table by first by finding how many standard
deviations away the number is from the mean.
The z-score
The
number of standard deviations from the mean is called the z-score and can be found by the formula
x
- m
z =
s
z =
s
Example
Find
the z-score corresponding to a raw score of 132 from a normal
distribution with mean 100 and standard deviation 15.
Solution
We
compute
132 - 100
z =
= 2.133
15
z =
15
Example
A
z-score of 1.7 was found from an observation coming from
a normal distribution with mean 14 and standard
deviation 3. Find the raw score.
Solution
We
have
x - 14
1.7 =
3
1.7 =
3
To
solve this we just multiply both sides by the denominator 3,
(1.7)(3) = x - 14
5.1 = x - 14
x = 19.1
The z-score and Area
Often
we want to find the probability that a z-score will be less than a given value,
greater than a given value, or in between two values. To accomplish this,
we use the table from the textbook and a few properties
about the normal distribution.
Example
Find
P(z < 2.37)
Solution
We
use the table. Notice the picture on the table has
shaded region corresponding to the area to the left (below) a z-score.
This is exactly what we want. Below are a few lines of the table.
|
z
|
.00
|
.01
|
.02
|
.03
|
.04
|
.05
|
.06
|
.07
|
.08
|
.09
|
|
2.2
|
.9861
|
.9864
|
.9868
|
.9871
|
.9875
|
.9878
|
.9881
|
.9884
|
.9887
|
.9890
|
|
2.3
|
.9893
|
.9896
|
.9898
|
.9901
|
.9904
|
.9906
|
.9909
|
.9911
|
.9913
|
.9916
|
|
2.4
|
.9918
|
.9920
|
.9922
|
.9925
|
.9927
|
.9929
|
.9931
|
.9932
|
.9934
|
.9936
|
The
columns corresponds to the ones and tenths digits of the z-score and the rows
correspond to the hundredths digits. For our problem we want the row 2.3 (from2.37) and the row .07 (from 2.37). The number in
the table that matches this is .9911.
Hence
P(z < 2.37) = .9911
Example
Find
P(z > 1.82)
Solution
In
this case, we want the area to the right of 1.82. This is not what
is given in the table. We can use the identity
P(z > 1.82) = 1 -
P(z < 1.82)
reading
the table gives
P(z < 1.82) = .9656
Our
answer is
P(z > 1.82) = 1 - .9656 = .0344
Example
Find
P(-1.18 < z <
2.1)
Solution
Once
again, the table does not exactly handle this type of area. However, the
area between -1.18 and 2.1 is equal to the
area to the left of 2.1 minus the area to the left of -1.18. That is
P(-1.18 < z <
2.1) = P(z < 2.1) - P(z < -1.18)
To
find P(z < 2.1) we rewrite it as P(z < 2.10) and use the table to get
P(z < 2.10) = .9821.
The
table also tells us that
P(z < -1.18) = .1190
Now
subtract to get
P(-1.18 < z <
2.1) = .9821 - .1190 = .8631
Excel Basics — Finding areas under the normal distribution.
Excel has some very
useful functions for finding areas under the normal distribution.
NORMSDIST(z)
Z is the
value for which you want the distribution.
Returns the standard
normal cumulative distribution function. The distribution has a mean of 0
(zero) and a standard deviation of one. Use this function in place of a table
of standard normal curve areas.
- a) Pick a cell and enter a z score into it (for example
2), don’t forget to add a label so you’ll know what you put in this cell.
b)
In a cell next to it, enter the function NORMSDIST(Z), use the address of the
cell where you placed the z score as your z value. What did you get?
(if
you used z=2, you should get an area of 0.97724 or in other words, more than
97% of the population have scores lower then your z. Try other values of z in
order to get a better feeling for the use of this function, for example
0,1,5,-1,-3)
NORMSINV(probability)
Probability
is a probability corresponding to the normal distribution.
Returns
the inverse of the standard normal cumulative distribution. The distribution
has a mean of zero and a standard deviation of one.
NORMSINV
will return a z score that corresponds to an area under the curve. The area
should be between 0 and 1.
- a) Pick a cell and enter a probability into it (for
example 0.975), don’t forget to add a label so you’ll know what you put in
this cell.
b)
In a cell next to it, enter the function NORMSINV(probability), use the address
of the cell where you placed the probability. What did you get?
(if
you used p=0.975, you should get a z score of 1.95996. What would you get if
you used a p = 0.97724 (you should get a value close to 2, your z from #1) Try
other values of p in order to get a better feeling for the use of this
function, for example 0.5,0.99.
In real life, we usually
deal with normal distributions that are not standardized, so they are not
expressed in z scores. Excel has several functions that will let you compute
areas under the curve directly from your scores without standardizing them
first.
NORMDIST(x,mean,standard_dev,cumulative)
X is the
value for which you want the distribution.
Mean is the
arithmetic mean of the distribution.
Standard_dev
is the standard deviation of the distribution.
Cumulative
is a logical value that determines the form of the function. If cumulative is
TRUE, NORMDIST returns the cumulative distribution function; if FALSE, it
returns the probability mass function. IN THIS EXERCISE USE
"TRUE" SINCE YOU WANT THE AREA UNDER THE CURVE.
3)
a) Enter your score, mean, S.D to different cell in Excel. Don’t forget to add
a label so you’ll know what you put in this cell, for example use,
x=102,m=100,sd = 2. You can also enter the word TRUE into a cell so you can use
it in the function.
b)
In another cell enter the function NORMDIST(x,mean,SD,cumulative), use the
address of the cells where you placed the x,mean,sd,TRUE. What did you get?
(if
you used the sample values, you should get an area of 0.8413. Try other values
of x, m and s.d in order to get a better feeling for the use of this function.
NORMINV(probability,mean,standard_dev)
Probability
is a probability corresponding to the normal distribution.
Mean is the
arithmetic mean of the distribution.
Standard_dev
is the standard deviation of the distribution.
Returns the inverse of
the normal cumulative distribution for the specified mean and standard
deviation.
4)
a) Type into different cells a probability,mean and S.D (for example p =
0.84134, m =100, s.d = 2), don’t forget to add a label so you’ll know what you
put in this cell.
b)
In a cell next to it, enter the function NORMINV(probability,mean,s.d), use the
addresses of the cells where you placed the probability, mean and S.D. What did
you get?
(if
you used the sample values, you should get a score close to 102 which was the x
value from #3. This means that 84.135% of the population have a score below 102
in a population with that is normally distributed with a mean of 100 and S.D of
2. Try other values of p, m and S.D in order to get a better feeling for the
use of this function.
Questions about the
normal distribution often ask you to calculate the area under the curve between
two scores or the probability that a score would turn out to be between two
scores. The following exercise shows you how to calculate those values easily.
5) Enter into cell A16
"M", B16 "SD", C16 "X1", D16 "X2", E16
"Z1", F16 "Z2",G16 "F(Z1)", H16
"F(Z2)", I16 "p".
Enter the values
100,2,96,104 below m, S.D, x1, x2 respectively. These values are usually given
to you in questions. Below Z1 we will calculate the standard score of X1. There
are two ways to do this, either using the formula we learned in class (X-m )/σ
or using the excel function STANDARDIZE(x, mean, S.D). Choose either one of
them.
Calculate Z2 in the same
way. If you used the sample values you should get z scores of —2,2.
Given the Z scores in
E17 and F17, which Excel function would you use to calculate the area under the
curve? (NORMSDIST). Calculate F(Z1) and F(Z2) using NORMSDIST, you should get
areas of 0.023,0.977.
In order to calculate
the area between these two scores, or the probability that a score would fall
between X1 and X2, calculate the difference between F(Z2) and F(Z1) in cell
I17. (H17-G17) You should get a value of 0.954 so there is 95.4 chance that a
given score would fall between 96 and 104 in our distribution.
This could be very
useful when working on the homework problems. For example, see question 4 in
chapter 6.
We are given the mean
height of a Merkin plant m=65 and the S.D = 3.
- What is the probability that the plant will be between
64 and 67 inches tall?
D)
What is the probability that the plant will be less than 40 inches tall? I
would use NORMDIST to answer this.
E)
What is the probability that the plant will be more than 60 inches tall? You
can use NORMDIST to calculate the probability that the plant will be less than
60 inches and then get the probability that the plant will be more than 60
inches tall using the fact that the total area under the curve is equal to 1.
Question:
What is the z score that
corresponds to Alpha = 0.05, or in other words: if we set up alpha to be 0.05
what is the z score that anything above it would cause us to reject the null
hypothesis?
Enter 0.05 into a cell
in Excel, label it Desired alpha. In another cell, calculate the bottom portion
of the distribution. In other words, if your alpha is set to 0.05, then 0.95 of
the population is below is so the bottom portion is equal to 1- desired alpha.
Now use the function
NORMSINV(p) to calculate the z score that corresponds to this alpha. Your p is
the bottom portion and you should get a z = 1.645, you will see this value many
times in the next few weeks.
As you will soon learn
in class, hypothesis testing can be either non-directional or directional. If
we divide the distribution to a bottom portion and the region above alpha we
are using a directional hypothesis and predicting that our effect will be found
in the upper portion of the curve. Sometime we don’t know where we will find an
effect so we use a non directional test. In that case, an alpha of 0.05 should
be divided to 2 so that we place 0.025 on one end of the curve and 0.025 on the
other side of the curve.
Enter your desired alpha
into a cell in Excel, the upper tail in this case will be alpha/2. Calculate
this value in another cell.
As before, the bottom
portion is 1-upper_tail, the only difference is that now the upper tail is
equal to alpha/2. Calculate the bottom portion in a different cell, you should
get 0.975.
You can now calculate
the z score that corresponds to the bottom portion using NORMSINV(p). You
should get z=1.96. Because of symmetry reasons (the standardized normal
distribution is symmetric around 0) the z score that corresponds to the upper
portion is equal to —z or —1.96. You can also get this value by using
NORMSINV(upper tail) or NORMSINV(0.025). Don’t worry if this is not entirely
clear, the class should clear up any confusion.
No comments:
Post a Comment