Understanding Variance and Standard Deviation in Probability

mat 2572 probability w statistics halleck n.w

1 / 20

Embed Share

Explore the concept of variance as a measure of dispersion in statistics, illustrated through simple examples. Dive into the relationship between variance, standard deviation, and the interpretation of average distance from the mean. Discover how to calculate variance and explore the differences between hypergeometric and binomial distributions in probability theory.

more_jon Follow

Uploaded on Jun 25, 2025 | 2 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

MAT 2572 Probability w/Statistics, Halleck Day 12 slides: 3.6 The Variance

What is variance? Variance is a measure of dispersion or variation, the extent to which values are spread out. Let s look at 2 very simple examples: X Y k -1 1 P(k) 1/2 1/2 P(k) k -1000 1000 1/2 1/2 Both Xand Yare both 0. For X, any outcome will be 1 unit from the mean. For Y, any outcome will be 1000 units from the mean. Clearly, the Y has a much larger variance.

For X and Y from the previous example, Var(X)=12 and Var(Y)=10002.

Standard deviation Unfortunately, variance does not have same units as random variable. So we define (Greek letter sigma ) = X = Std(X) = ???(?) For the previous examples, X = 1 and Y = 1000. Intuitively, you can think of the standard deviation as: a random outcome will on average be units from the mean.

Interpretation of variance The notion of average distance from the mean introduced in the previous slide is not quite correct. A more precise notion is: put weights on the shaft as before and spin around the fulcrum, then the resistance to a change in the speed of rotation, the moment of inertia, is the same as the variance. I = =

Theorem 3.6.1 (computational formula) In practice, the definition for variance is hardly ever used. Instead, it is calculated using:

Hypergeometric (non replacement) For urn w/ r red chips & w white chips. Select n balls. Set X = # of red chips selected. Recall E(X)= ?+?. E.g. r=3 and w=7 and n=5, then on average, ?? 5 3 3+7=5 3 10=3 2= 1.5 red chips will be selected.

Binomial (replacement) For urn w/ r red chips & w white chips, select a ball, record its color and then put the ball back into the urn. Repeat n times. Let X = # of red chips selected. Each time a ball is selected, chance of selecting a red ball p = ? ?+?. ? ?? ?+? red balls will be selected. Therefore, on average E(X)= ? Note: this is the same result as we got for hypergeometric. Do you think the variance will also be the same for the 2 distributions? If not, which one will be bigger? Why? ?+?=

Variance properties 1. Given Y=aX+b then Var(Y)=a2*Var(X) Variance is affected by scaling but not by shifting. 2. If X and Y are independent, then V(X+Y)=V(X)+V(Y) And this extends to multiple summands as well. 3. If the { Xi } are independent and ? = ?1+?2+ + ?? then V ? = ? ?1 + ? ?2 + + ? ??

Returning to binomial variance For each trial i , Xi and Xi2 are identical as distributions Xi = Xi2 P(X) 0 q=1-p 1 p Hence Var(Xi)=E(Xi2)- 2= - 2=p-p2=p(1-p)=pq In case of urn, Var(Xi) = Binomial (but NOT hypergeometric) is sum of n independent trials, so ?? ?+? ?+? ?+? If n=5, r=3 and w=7, then W = X1+X2+ +X5 and Var(W) = 5 3 7 ? ? ?+? ?+? ?? ?? ??? ?+? V W = 2+ 2+ + 2= 2 (or in general npq). 3+72 = 1.05

binomial vs. hypergeometric (cont.) Unfortunately, we do not have tools to do hypergeometric in general, so let s focus on the running example: n=5, r=3 and w=7 3 0 5 10 5 5 W: W2: k 0 1 2 3 P(k) 1/12 5/12 5/12 1/12 7 7 3 7 2 7 4 3 3 3 1 3 2 7 6 5 4 3 10 9 8 7 6= 1 12, =3 7 6 5 4 5 10 9 8 7 6= 5 12, 5 12, 1 12 = = = 10 10 5 10 5 k^2 P(k) 0 1 4 9 1/12 5/12 5/12 1/12 So Var(W)=E(W2) 2=(34/12)-(3/2)2 =(34/12)-(27/12) =7/12 Which is quite a bit less than in the binomial situation. What we have witnessed is that with dependency, variation is reduced.

Continuous example 1 (uniform) f (t)=1, 0 t 1 Recall =1 2 1 0 =1 1t2 ?? =1 3t3 E(Y2) = 0 3 2 Hence, Var(Y) = ? ?2 ?2=1 1 2 1 12 3 = 1 12 .29 And = In words, a random outcome will on average be .3 units from mean. (Not quite correct, in actuality a random outcome will be on ave .25 from .)

Continuous example 2: f (t)=3t2, 0 t 1 1t 3t2dt =3 0=3 4t4 1 Recall = 0 4 1t2 3t2dt =3 0=3 5t5 1 E(Y2) = 0 5 2 Hence, Var(Y) = ? ?2 ?2=3 3 4 =3 9 3 80 5 5 16= 3 80 .19 < .29 (Var for uniform) demonstrating: If distribution gets concentrated in one area, then variance is reduced. And =

Higher Moments k Multiply density function by successive powers of y and integrate: yk fY(y)dy k = Expectation is 1st moment: = 1 Variance is combination of 1st two moments: 2= 2 12 or alternatively the 2nd moment around mean 2= y 2 fY(y)dy [Do you see a parallel with Maclaurin and Taylor Series?]

Shape statistics: skewness A dimensionless version of 3rd moment about mean is skewness 1 = 3 / 3 = E[(W )3]/ 3 In particular, a distribution is symmetric if and only if 1 = 0. Examples Positively skewed: exponential, 1 = 2 (show!) Negatively skewed: binomial p> , 1 = 1 2? 1 2(3 4) 3(3 4) ??? (show!) [p =3 = 1 4, n = 3 1 = 3] 4)(1

Another negatively skewed example Recall continuous example 2 It is notsymmetric, so skewness 0. E((Y- )3) = 0 =3 0 = 3(1 6 =3(1 6 1= E[(Y )3]/ 3=( 1/160)/(3/80)3/2 0.86 [This example took me over an hour to do, as I made a simple mistake. I suggest that you make use of a CAS like Maple to check each step as you go along.] Exercise: show for f(t)=t/2, 0 t 1 , 1= ( 1/135)/(1/18)3/2 0.57 1(t 0.75)33t2dt 4t4+27 20+27 9 20+ 1t5 9 16t3 27 64 9 32) = 3(16 5 9 24+9 15 64t2dt 9 64) = 3(1 9 20+18 9 6 64) ) = 3(16 5 9(24 15) ) = 1/160 32 15 32 15

Shape statistics: Kurtosis A dimensionless version of 4thmoment about shifted to left by 3 is kurtosis 2= 4 / 4 3= E[(W )4]/ 4 3 The shift left is so that the normal distribution has 2 = 0. For uniform distribution f(y)=1, 0 t 1: 1(t 0.5)4dt = (1/5)(t 0.5)5 1 E((Y- )4) = 0 0 = 1/80 Hence, 2= E[(Y )4]/ 4 3 = (1/80)/(1/12)2 3 = (9/5) 3 = 1.2

Some more distributions w/ significant Kurtosis: Semicircle: 2= 1 Laplace: 2= 3 All the distributions within each diagram have the same kurtosis, demonstrating that when we divide by 4, we eliminate any affect of scaling.