Z score - Business Definition

In statistics, a Z score represents the number of standard deviations that a particular value of a variable is from the mean. The Z score = [(specific value of a variable) - (mean of the variable)] ÷ (standard deviation of the variable). A Z score indicates how different a specific value is from the mean. Also called standard score, Z value.

Z test Statistics, Sample Calculator

Calculate the z score from the two samples of population mean and standard deviation using z test statistics calculator.

Z test Statistics, Sample Calculation

Code to add this calci to your website

Formula Used:
Z value = (X-µ) / σ

Where,
    X = Standardized Random Variable,
    µ = Population Mean,
    σ = Population Standard Deviation.

We can also calculate the Z Score even when only the two samples population mean and standard deviation are known. The Z test statistics calculation is made easier.

The Power of Z

A common statistical way of standardizing data on one scale so a comparison can take place is using a z-score. The z-score is like a common yard stick for all types of data. Each z-score corresponds to a point in a normal distribution and as such is sometimes called a normal deviate since a z-score will describe how much a point deviates from a mean or specification point.

In Six Sigma parlance, z-score and process sigma are used interchangeably and are sometimes called z-equivelents. Strictly speaking, the process sigma and z-equivalents are loosely tied to the statistical z-score. The statistical z-score has very strict definitions derived from the rules of the normal distribution. For most applications in Six Sigma, ignoring some of those constraints is innocuous. In usability testing the benefit of the standardization from process sigmas allow us to meaningfully compare disparate measures like task completion and time on task.

The z-score/process sigma is calculated by subtracting your sample mean from a target data point and dividing by the target standard deviation. This value is a measure of the distance in standard deviations of a sample from the mean and is expressed using the Greek letter σ. If your sample is 3 standard deviations from the spec limit, you would describe your process as 3 sigma. or 3σ

The further away a sample is from the spec limit the higher the z-score and process sigma. A higher process sigma means a less defective process. The term Six Sigma originates from the z-score. 6σ means that six standard deviations lie between the mean of a sample and the nearest specification limit. To visualize the Z-score see the Interactive Graph of the Standard Normal Curve

Each process sigma has two equivalent values which provide a meaningful way to compare data and understand how defective a process is:

DPMO: Each expresses the probability of a defect in terms of a defect per million opportunities or DPMO. That is, if a condition were to occur one million times, how many times out of that one million would a defect occur? A process sigma of .5 is equal to 308,000 defects per million opportunities. And a process sigma of 2.5 means that 6,210 out of 1 million times there will be a defect. For a sample that is 6σ, the DPMO is .0.001. Some organizations prefer to think in terms of defects per opportunities instead of the more abstract "standard deviations above the spec limit."
Probability of a Defect: The process sigma can also be described in terms of a probability of a defect. A z-score of .5 means there is a 30% probability of encountering a defect. A z-score of .25 means there is a 40% probability of a defect. For a sample that is 6σ, the probability of a defect is .0000001%. Note: Values do not include a 1.5σ shift.

Why use a Process Sigma?

The process sigma is helpful in three ways:

It allows you to compare disparate types of data (seconds, which are a continuous measurement with task completion which is binary with errors which are discrete count data)
It provides you with a probability of a defect
You can meaningfully compare two different products or processes:

The process sigma for one release of a software product can be compared to subsequent versions
You can compare two different products' process sigmas
You can compare one module of the same product with a different module on the same product
You can use the properties of the normal distribution to aide in assessing and improving your data set.

The z-score

The Standard Normal Distribution

Definition of the Standard Normal Distribution

The Standard Normal distribution follows a normal distribution and has mean 0 and standard deviation 1

Notice that the distribution is perfectly symmetric about 0.

If a distribution is normal but not standard, we can convert a value to the Standard normal distribution table by first by finding how many standard deviations away the number is from the mean.

The z-score

The number of standard deviations from the mean is called the z-score and can be found by the formula

                  x -  m
        z =
                     s

Example

Find the z-score corresponding to a raw score of 132 from a normal distribution with mean 100 and standard deviation 15.

Solution

We compute

                    132 -  100
        z    =                         = 2.133
                          15

Example

A z-score of 1.7 was found from an observation coming from a normal distribution with mean 14 and standard deviation 3. Find the raw score.

Solution

We have

                       x -  14
        1.7    =
                            3

To solve this we just multiply both sides by the denominator 3,

(1.7)(3) = x - 14

5.1 = x - 14

x = 19.1

The z-score and Area

Often we want to find the probability that a z-score will be less than a given value, greater than a given value, or in between two values. To accomplish this, we use the table from the textbook and a few properties about the normal distribution.

Example

Find

P(z < 2.37)

Solution

We use the table. Notice the picture on the table has shaded region corresponding to the area to the left (below) a z-score. This is exactly what we want. Below are a few lines of the table.

z	.00	.01	.02	.03	.04	.05	.06	.07	.08	.09
2.2	.9861	.9864	.9868	.9871	.9875	.9878	.9881	.9884	.9887	.9890
2.3	.9893	.9896	.9898	.9901	.9904	.9906	.9909	.9911	.9913	.9916
2.4	.9918	.9920	.9922	.9925	.9927	.9929	.9931	.9932	.9934	.9936

The columns corresponds to the ones and tenths digits of the z-score and the rows correspond to the hundredths digits. For our problem we want the row 2.3 (from2.37) and the row .07 (from 2.37). The number in the table that matches this is .9911.

Hence

P(z < 2.37) = .9911

Example

Find

P(z > 1.82)

Solution

In this case, we want the area to the right of 1.82. This is not what is given in the table. We can use the identity

P(z > 1.82) = 1 - P(z < 1.82)

reading the table gives

P(z < 1.82) = .9656

Our answer is

P(z > 1.82) = 1 - .9656 = .0344

Example

Find

P(-1.18 < z < 2.1)

Solution

Once again, the table does not exactly handle this type of area. However, the area between -1.18 and 2.1 is equal to the area to the left of 2.1 minus the area to the left of -1.18. That is

P(-1.18 < z < 2.1) = P(z < 2.1) - P(z < -1.18)

To find P(z < 2.1) we rewrite it as P(z < 2.10) and use the table to get

P(z < 2.10) = .9821.

The table also tells us that

P(z < -1.18) = .1190

Now subtract to get

P(-1.18 < z < 2.1) = .9821 - .1190 = .8631

Excel Basics — Finding areas under the normal distribution.

Excel has some very useful functions for finding areas under the normal distribution.

NORMSDIST(z)

Z is the value for which you want the distribution.

Returns the standard normal cumulative distribution function. The distribution has a mean of 0 (zero) and a standard deviation of one. Use this function in place of a table of standard normal curve areas.

a) Pick a cell and enter a z score into it (for example 2), don’t forget to add a label so you’ll know what you put in this cell.

b) In a cell next to it, enter the function NORMSDIST(Z), use the address of the cell where you placed the z score as your z value. What did you get?

(if you used z=2, you should get an area of 0.97724 or in other words, more than 97% of the population have scores lower then your z. Try other values of z in order to get a better feeling for the use of this function, for example 0,1,5,-1,-3)

NORMSINV(probability)

Probability is a probability corresponding to the normal distribution.

Returns the inverse of the standard normal cumulative distribution. The distribution has a mean of zero and a standard deviation of one.

NORMSINV will return a z score that corresponds to an area under the curve. The area should be between 0 and 1.

a) Pick a cell and enter a probability into it (for example 0.975), don’t forget to add a label so you’ll know what you put in this cell.

b) In a cell next to it, enter the function NORMSINV(probability), use the address of the cell where you placed the probability. What did you get?

(if you used p=0.975, you should get a z score of 1.95996. What would you get if you used a p = 0.97724 (you should get a value close to 2, your z from #1) Try other values of p in order to get a better feeling for the use of this function, for example 0.5,0.99.

In real life, we usually deal with normal distributions that are not standardized, so they are not expressed in z scores. Excel has several functions that will let you compute areas under the curve directly from your scores without standardizing them first.

NORMDIST(x,mean,standard_dev,cumulative)

X is the value for which you want the distribution.

Mean is the arithmetic mean of the distribution.

Standard_dev is the standard deviation of the distribution.

Cumulative is a logical value that determines the form of the function. If cumulative is TRUE, NORMDIST returns the cumulative distribution function; if FALSE, it returns the probability mass function. IN THIS EXERCISE USE "TRUE" SINCE YOU WANT THE AREA UNDER THE CURVE.

3) a) Enter your score, mean, S.D to different cell in Excel. Don’t forget to add a label so you’ll know what you put in this cell, for example use, x=102,m=100,sd = 2. You can also enter the word TRUE into a cell so you can use it in the function.

b) In another cell enter the function NORMDIST(x,mean,SD,cumulative), use the address of the cells where you placed the x,mean,sd,TRUE. What did you get?

(if you used the sample values, you should get an area of 0.8413. Try other values of x, m and s.d in order to get a better feeling for the use of this function.

NORMINV(probability,mean,standard_dev)

Probability is a probability corresponding to the normal distribution.

Mean is the arithmetic mean of the distribution.

Standard_dev is the standard deviation of the distribution.

Returns the inverse of the normal cumulative distribution for the specified mean and standard deviation.

4) a) Type into different cells a probability,mean and S.D (for example p = 0.84134, m =100, s.d = 2), don’t forget to add a label so you’ll know what you put in this cell.

b) In a cell next to it, enter the function NORMINV(probability,mean,s.d), use the addresses of the cells where you placed the probability, mean and S.D. What did you get?

(if you used the sample values, you should get a score close to 102 which was the x value from #3. This means that 84.135% of the population have a score below 102 in a population with that is normally distributed with a mean of 100 and S.D of 2. Try other values of p, m and S.D in order to get a better feeling for the use of this function.

Questions about the normal distribution often ask you to calculate the area under the curve between two scores or the probability that a score would turn out to be between two scores. The following exercise shows you how to calculate those values easily.

5) Enter into cell A16 "M", B16 "SD", C16 "X1", D16 "X2", E16 "Z1", F16 "Z2",G16 "F(Z1)", H16 "F(Z2)", I16 "p".

Enter the values 100,2,96,104 below m, S.D, x1, x2 respectively. These values are usually given to you in questions. Below Z1 we will calculate the standard score of X1. There are two ways to do this, either using the formula we learned in class (X-m )/σ or using the excel function STANDARDIZE(x, mean, S.D). Choose either one of them.

Calculate Z2 in the same way. If you used the sample values you should get z scores of —2,2.

Given the Z scores in E17 and F17, which Excel function would you use to calculate the area under the curve? (NORMSDIST). Calculate F(Z1) and F(Z2) using NORMSDIST, you should get areas of 0.023,0.977.

In order to calculate the area between these two scores, or the probability that a score would fall between X1 and X2, calculate the difference between F(Z2) and F(Z1) in cell I17. (H17-G17) You should get a value of 0.954 so there is 95.4 chance that a given score would fall between 96 and 104 in our distribution.

This could be very useful when working on the homework problems. For example, see question 4 in chapter 6.

We are given the mean height of a Merkin plant m=65 and the S.D = 3.

What is the probability that the plant will be between 64 and 67 inches tall?

D) What is the probability that the plant will be less than 40 inches tall? I would use NORMDIST to answer this.

E) What is the probability that the plant will be more than 60 inches tall? You can use NORMDIST to calculate the probability that the plant will be less than 60 inches and then get the probability that the plant will be more than 60 inches tall using the fact that the total area under the curve is equal to 1.

Question:

What is the z score that corresponds to Alpha = 0.05, or in other words: if we set up alpha to be 0.05 what is the z score that anything above it would cause us to reject the null hypothesis?

Enter 0.05 into a cell in Excel, label it Desired alpha. In another cell, calculate the bottom portion of the distribution. In other words, if your alpha is set to 0.05, then 0.95 of the population is below is so the bottom portion is equal to 1- desired alpha.

Now use the function NORMSINV(p) to calculate the z score that corresponds to this alpha. Your p is the bottom portion and you should get a z = 1.645, you will see this value many times in the next few weeks.

As you will soon learn in class, hypothesis testing can be either non-directional or directional. If we divide the distribution to a bottom portion and the region above alpha we are using a directional hypothesis and predicting that our effect will be found in the upper portion of the curve. Sometime we don’t know where we will find an effect so we use a non directional test. In that case, an alpha of 0.05 should be divided to 2 so that we place 0.025 on one end of the curve and 0.025 on the other side of the curve.

Enter your desired alpha into a cell in Excel, the upper tail in this case will be alpha/2. Calculate this value in another cell.

As before, the bottom portion is 1-upper_tail, the only difference is that now the upper tail is equal to alpha/2. Calculate the bottom portion in a different cell, you should get 0.975.

You can now calculate the z score that corresponds to the bottom portion using NORMSINV(p). You should get z=1.96. Because of symmetry reasons (the standardized normal distribution is symmetric around 0) the z score that corresponds to the upper portion is equal to —z or —1.96. You can also get this value by using NORMSINV(upper tail) or NORMSINV(0.025). Don’t worry if this is not entirely clear, the class should clear up any confusion.

Z-SCORE DEFINE,PROCESS,EXAMPLE,HOW TO DO IN EXCEL

Monday, 7 October 2013

Z-SCORE GROUP 4 PARSHAV MEHTA 2013021

Z test Statistics, Sample Calculator

Z test Statistics, Sample Calculation

No comments:

Post a Comment