
Bivariate Random Variables and Probabilities Analysis
Explore the concepts of bivariate random variables, joint probabilities, and independence in statistics and data analysis. Learn about inherited color blindness probabilities, independent random variables, and the definition of independence in statistical analysis.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics 6-1/46 Part 6: Bivariate Random Variables
Statistics and Data Analysis Part 6 Bivariate Random Variables and Correlation 6-2/46 Part 6: Bivariate Random Variables
Probabilities for two Events, A,B Marginal Probability = The probability of an event not considering any other events. P(A) Joint Probability = The probability that two events happen at the same time. P(A,B) Conditional Probability = The probability that one event happens given that another event has happened. P(A|B) 6-3/46 Part 6: Bivariate Random Variables
Probabilities: Inherited Color Blindness Inherited color blindness has different incidence rates in men and women. Women usually carry the defective gene and men usually inherit it. Pick an individual at random from the population. CB = has inherited color blindness MALE = gender Marginal: P(CB) Conditional: P(CB|MALE) P(CB|FEMALE) Joint: P(CB and MALE) P(CB and FEMALE) = 2.75% = 5.0% (1 in 20 men) = 0.5% (1 in 200 women) = 2.5% = 0.25% 6-4/46 Part 6: Bivariate Random Variables
Independent Random Variables One card is drawn randomly from a deck of 52 cards P(Ace|Heart) = 1/13 P(Ace|~Heart) = 3/39 = 1/13 P(Ace) = 4/52 = 1/13 Ace P(Ace) does not depend on whether the card is a heart or not. Heart Yes=1 No=0 Total Yes=1 1/52 12/52 13/52 P(Heart|Ace) = 1/4 P(Heart|~Ace) = 12/48 = 1/4 No=0 3/52 36/52 39/52 P(Heart) = 13/52 = 1/4 Total 4/52 48/52 52/52 P(Heart) does not depend on whether the card is an ace or not. 6-5/46 Part 6: Bivariate Random Variables
Independence Random variables are independent if the occurrence of one does not affect the probability distribution of the other. If P(Y|X) does not change when X changes, then the variables are independent. 6-6/46 Part 6: Bivariate Random Variables
Equivalent Definition of Independence Random variables X and Y are independent if PXY(X,Y) = PX(X)PY(Y). The joint probability equals the product of the marginal probabilities. 6-7/46 Part 6: Bivariate Random Variables
Independent Events Ace P(Ace,Heart) = 1/52 Heart Yes=1 No=0 Total P(Ace) = 1/13 Yes=1 1/52 12/52 13/52 =1/4 39/52 P(Heart) = 1/4 P(Ace) x P(Heart) = (1/13)(1/4) = 1/52. No=0 3/52 36/52 Ace and Heart are independent Total 4/52 =1/13 48/52 52/52 6-8/46 Part 6: Bivariate Random Variables
Not Independent Events P(Color blind, Male) = .025 Color Blind P(Male) = .500, P(Color blind) = .0275 Gender No Yes Total P(Color blind) x P(Male) = .500 x .0275 = .01375 Male .475 .025 0.50 Female .4975 .0025 0.50 .01375 is not equal to .025 Total .97255 .0275 1.00 Gender and color blindness are not independent. 6-9/46 Part 6: Bivariate Random Variables
Two Important Math Results For two random variables, P(X,Y) = P(X|Y) P(Y) P(Color blind, Male) = P(Color blind|Male)P(Male) = .05 x .5 = .025 For two independent random variables, P(X,Y) = P(X) P(Y) P(Ace,Heart) = P(Ace) x P(Heart). (This does not work if they are not independent.) 6-10/46 Part 6: Bivariate Random Variables
Conditional Probability Prob(A | B) = P(A,B) / P(B) Prob(Color Blind | Male) Color Blind Gender No Yes Total Male .475 .025 0.500 Prob(Color Blind,Male) = ------------------------------- P(Male) Female .4975 .0025 0.50 Total .97255 .0275 1.00 = .025 / .50 = .05 What is P(Male | Color Blind)? 6-11/46 Part 6: Bivariate Random Variables
Conditional Distributions Overall Distribution Color Blind Not Color Blind .0275 .9725 Distribution Among Men (Conditioned on Male) Color Blind|Male Not Color Blind|Male .05 .95 Distribution Among Women (Conditioned on Female) Color Blind|Female Not Color Blind|Female .005 .995 The distribution changes given gender. 6-12/46 Part 6: Bivariate Random Variables
Application Legal Case Mix: Two kinds of cases show up each month, real estate (R) and financial (F) (sometimes together, usually separately). Marginal Distribution for Financial Cases Real Estate Financial 0 1 2 3 P(F) 0 . 20 Joint Distribution R = Real estate cases F = Financial cases 1 .33 2 .47 P(R) .09 .16 .22 .53 1.00 Marginal Distribution for Real Estate Cases 6-13/46 Part 6: Bivariate Random Variables
Legal Services Case Mix Real Estate (R) Joint Discrete Distribution R = Real estate cases F = Financial cases Financial (F) 0 1 2 3 P(F) 0 .02 .05 .05 .08 .20 1 .03 .05 .08 .17 .33 Joint Distribution Joint probabilities are 2 .04 .06 .09 .28 .47 Prob(F=f and R=r) P(R) .09 .16 .22 .53 1.00 Note that marginal probabilities are obtained by summing across or down. 6-14/46 Part 6: Bivariate Random Variables
Legal Services Case Mix Real Estate (R) Joint Discrete Distribution R = Real estate cases F = Financial cases Financial (F) 0 1 2 3 P(f) 0 .02/.20 =.10 .05/.20 =.25 .05/.20 =.25 .08/.20 =.40 .20 1 .03/.33 =.10 .05/.33 =.15 .08/.33 =.24 .17/.33 =.51 .33 Conditional Distributions 2 .04/.47 =.09 .06/.47 =.13 .09/.47 =.19 .28/.47 =.59 .47 Read across the rows. Probabilities for R given the value of F Conditional probabilities are Prob(R=r and F=f)/P(F=f) 6-15/46 Part 6: Bivariate Random Variables
Conditional Distributions The probability distribution of Real estate cases (R) given Financial cases (F) varies with the number of Financial cases. The probability that (R=3)|F goes up as F increases from 0 to 2. This means that the variables are not independent. Conditional Probabilities for Real Estate Cases 0 1 2 3 R|F Financial=0 .10 .25 .25 .40 1.00 Financial=1 .10 .15 .24 .51 1.00 Financial=2 .09 .13 .19 .59 1.00 6-16/46 Part 6: Bivariate Random Variables
Covariation Pick 10,325 people at random from the population. Predict how many will be color blind: 10,325 x .0275 = 284 Pick 10,325 MEN at random from the population. Predict how many will be color blind: 10,325 x .05 = 516 Pick 10,325 WOMEN at random from the population. Predict how many will be color blind: 10,325 x .005 = 52 The expected number of color blind people, given gender, depends on gender. Color Blindness covaries with Gender 6-17/46 Part 6: Bivariate Random Variables
Covariation in legal services Real Estate Cases 0 1 2 3 Financial=0 .10 .25 .25 .40 Financial=1 .10 .15 .24 .51 Financial=2 .09 .13 .19 .59 These are the conditional distributions P(R|F) How many real estated cases should the office expect if it knows (or predicts) the number of financial cases? E[R if F=0] = 0(.10) + 1(.25) + 2(.25) + 3(.40) = 1.95 (less than 2) E[R if F=1] = 0(.10) + 1(.15) + 2(.24) + 3(.51) = 2.16 (more than 2) E[R if F=2] = 0(.09) + 1(.13) + 2(.19) + 3(.59) = 2.28 (more than 2) This is how R and F covary. 6-18/46 Part 6: Bivariate Random Variables
Covariation and Regression Expected Number of Real Estate Cases Given Number of Financial Cases 2.4 - 2.3 - 2.2 - The regression of R on F 2.1 - 2.0 - 1.9 - 0 1 2 Financial Cases 6-19/46 Part 6: Bivariate Random Variables
Measuring How Variables Move Together: Covariance = Cov(X,Y) P(x,y)(x- )(y ) X Y values of X values of Y Covariance can be positive or negative The measure will be positive if it is likely that Y is above its mean when X is above its mean. It is usually denoted XY. 6-20/46 Part 6: Bivariate Random Variables
Legal Services Case Mix Covariance Compute the Covariance F R (F-1.27)(R-2.19)P(F,R)= (0-1.27)(0-2.19).02= +.055626 (0-1.27)(1-2.19).05= +.075565 (0-1.27)(2-2.19).05= +.012065 (0-1.27)(3-2.19).08= -.082296 (1-1.27)(0-2.19).03= +.017739 (1-1.27)(1-2.19).05= +.016065 (1-1.27)(2-2.19).08= +.004104 (1-1.27)(3-2.19).17= -.037179 (2-1.27)(0-2.19).04= -.063948 (2-1.27)(1-2.19).06= -.052122 (2-1.27)(2-2.19).09= -.012483 (2-1.27)(3-2.19).28= +.165564 Sum = +0.09870 Real Estate Financial 0 1 2 3 P(F) 0 .02 .05 .05 .08 .20 1 .03 .05 .08 .17 .33 2 .04 .06 .09 .28 .47 P(R) .09 .16 .22 .53 1.00 The two means are R = 0(.09)+1(.16)+2(.22)+3(.53) = 2.19 F = 0(.20)+1(.33)+2(.47) = 1.27 6-21/46 Part 6: Bivariate Random Variables
Covariance and Scaling Real Estate Compute the Covariance Cov(R,F) = +0.09870 Financial 0 1 2 3 P(F) What does the covariance mean? 0 .02 .05 .05 .08 .20 1 .03 .05 .08 .17 .33 Suppose each real estate case requires 2 lawyers and each financial case requires 3 lawyers. Then the number of lawyers is NR = 2R and NF = 3F. The covariance of NR and NF will be 3(2)(.0987) = 0.5922. But, the relationship is the same. 2 .04 .06 .09 .28 .47 P(R) .09 .16 .22 .53 1.00 R = 0(.09)+1(.16)+2(.22)+3(.53 ) = 2.19 F = 0(.20)+1(.33)+2(.47) = 1.27 6-22/46 Part 6: Bivariate Random Variables
Independent Random Variables Have Zero Covariance One card drawn randomly from a deck of 52 cards E[H] = 1(13/52)+0(49/52) = 1/4 A=Ace E[A] = 1(4/52)+0(48/52) = 1/13 H=Heart Yes=1 No=0 Total Covariance = H A(H-mH)(A-mA)P(H,A) (1 - 1/4)(1 - 1/13)1/52 = +36/522 Yes=1 1/52 12/52 13/52 (0 - 1/4)(1 1/13)3/52 = -36/522 No=0 3/52 36/52 39/52 (1 1/4)(0 1/13)12/52 = -36/522 Total 4/52 48/52 52/52 (0 1/4)(0 1/13)36/52 = +36/522 SUM = 0 !! 6-23/46 Part 6: Bivariate Random Variables
A Shortcut for Covariance = Cov(X,Y) P(x,y)(x- )(y ) X Y values of X values of Y = X P(x,y)x y - Y values of X values of Y 6-24/46 Part 6: Bivariate Random Variables
Computing the Covariance Using the Shortcut Compute the Covariance [ F R FR * P(F,R)] [ F R] (0)(0).02= 0 (0)(1).05= 0 (0)(2).05= 0 (0)(3).08= 0 (1)(0).03= 0 (1)(1).05= .05 (1)(2).08= .16 (1)(3).17= .51 (2)(0).04= 0 (2)(1).06= .12 (2)(2).09= .36 (2)(3).28= 1.68 Sum = 2.88 2.88 (1.27)(2.19) = 0.09870 Compute the Covariance F R [(F-1.27)(R-2.19) * P(F,R)] = (0-1.27)(0-2.19).02=+.055626 (0-1.27)(1-2.19).05=+.075565 (0-1.27)(2-2.19).05=+.012065 (0-1.27)(3-2.19).08= -.082296 (1-1.27)(0-2.19).03=+. 017739 (1-1.27)(1-2.19).05= +.016065 (1-1.27)(2-2.19).08= +.004104 (1-1.27)(3-2.19).17= -.037179 (2-1.27)(0-2.19).04= -.063948 (2-1.27)(1-2.19).06= -.052122 (2-1.27)(2-2.19).09= -.012483 (2-1.27)(3-2.19).28= +.165564 Sum = +0.09870 6-25/46 Part 6: Bivariate Random Variables
Covariance and Units of Measurement Covariance takes the units of (units of X) times (units of Y) Consider Cov($Price of X,$Price of Y). Now, measure both prices in GBP (roughly $1.60 per ). The prices are divided by 1.60, and the covariance is divided by 1.602. This is an unattractive result. 6-26/46 Part 6: Bivariate Random Variables
Correlation is Units Free Correlation Coefficient Covariance(x,y) = XY Standard deviation(x) Standard deviation(y) 1.00 +1.00. XY 6-27/46 Part 6: Bivariate Random Variables
Correlation R = 2.19 F = 1.27 Real Estate Financial 0 1 2 3 P(F) Var(F) = 02(.20)+12(.33)+22(.47) - 1.272 = 0.62323 Standard deviation = .78945 0 .02 .05 .05 .08 .20 1 .03 .05 .08 .17 .33 Var(R) = 02(.09)+12(.16)+22(.22) +32(.53) 2.192 = 1.0139 Standard deviation = 1.006926 2 .04 .06 .09 .28 .47 P(R) .09 .16 .22 .53 1.00 .0987 .0987 Correlation = Correlation = = 0.12416 = 0.12416 Covariance = +0.09870 .78945 1.006926 .78945 1.006926 6-28/46 Part 6: Bivariate Random Variables
Aspect of Correlation Independence implies zero correlation. If the variables are independent, then the numerator of the correlation coefficient is 0. 6-29/46 Part 6: Bivariate Random Variables
Sums of Two Random Variables Example 1: Total number of cases = F+R Example 2: Personnel needed Find for Sums Expected Value Variance and Standard Deviation Application from Finance: Portfolio = 3F+2R 6-30/46 Part 6: Bivariate Random Variables
Math Facts 1 Mean of a Sum Mean of a sum. The Mean of X+Y = E[X+Y] = E[X]+E[Y] Mean of a weighted sum Mean of aX + bY = E[aX] + E[bY] = aE[X] + bE[Y] 6-31/46 Part 6: Bivariate Random Variables
Mean of a Sum Real Estate Financial 0 1 2 3 P(F) 0 .02 .05 .05 .08 .20 1 .03 .05 .08 .17 .33 2 .04 .06 .09 .28 .47 R = 2.19 F = 1.27 P(R) .09 .16 .22 .53 1.00 What is the mean (expected) number of cases each month, R+F? E[R + F] = E[R] + E[F] = 2.19 + 1.27 = 3.46 6-32/46 Part 6: Bivariate Random Variables
Mean of a Weighted Sum Suppose each Real Estate case requires 2 lawyers and each Financial case requires 3 lawyers. Then NR = 2R and NF = 3F. R = 2.19 F = 1.27 If NR = 2R and NF = 3F, then the mean number of lawyers is the mean of 2R+3F. E[2R + 3F] = 2E[R] + 3E[F] = 2(2.19) + 3(1.27) = 8.19 lawyers required. 6-33/46 Part 6: Bivariate Random Variables
Math Facts 2 Variance of a Sum Variance of a Sum Var[x+y] = Var[x] + Var[y] +2Cov(x,y) Variance of a sum equals the sum of the variances only if the variables are uncorrelated. Standard deviation of a sum The standard deviation of x+y is not equal to the sum of the standard deviations. 2 2 x y x y + = + + 2 xy 6-34/46 Part 6: Bivariate Random Variables
Variance of a Sum R = 2.19, R2 = 1.0139 F = 1.27, F2 = 0.62323 RF = 0.0987 What is the variance of the total number of cases that occur each month? This is the variance of F+R = (1.0139 + 0.62323 + 2(.0987)) = 1.83453. The standard deviation is 1.35445. 6-35/46 Part 6: Bivariate Random Variables
Math Facts 3 Variance of a Weighted Sum Var[ax+by] = Var[ax] + Var[by] +2Cov(ax,by) = a2Var[x] + b2Var[y] + 2ab Cov(x,y). Also, Cov(x,y) is the numerator in xy, so Cov(x,y) = xy x y. ax by + = + + 2 2 x 2 2 y a b 2ab xy x y 6-36/46 Part 6: Bivariate Random Variables
Variance of a Weighted Sum R = 2.19, F = 1.27, RF = 0.0987, RF = .14216 R2 = 1.0139 F2 = 0.62323 Suppose each real estate case requires 2 lawyers and each financial case requires 3 lawyers. Then NR = 2R and NF = 3F. What is the variance of the total number of lawyers needed each month? What is the standard deviation? This is the variance of 2R+3F = 22(1.0139) + 32(0.62323) + 2(2)(3)(.12416)(1.006926)(0.78945)=10.84903 The standard deviation is the square root, 3.29379 6-37/46 Part 6: Bivariate Random Variables
Application - Portfolio You have $1000 to allocate between assets A and B. The yearly returns on the two assets are random variables rA and rB. The means of the two returns are E[rA] = A and E[rB] = B The standard deviations (risks) of the returns are A and B. The correlation of the two returns is AB 6-38/46 Part 6: Bivariate Random Variables
6-39/46 Part 6: Bivariate Random Variables
The two returns are positively correlated. 6-40/46 Part 6: Bivariate Random Variables
6-41/46 Part 6: Bivariate Random Variables
Portfolio You have $1000 to allocate to A and B. You will allocate proportions w of your $1000 to A and (1-w) to B. 6-42/46 Part 6: Bivariate Random Variables
Return and Risk Your expected return on each dollar is E[wrA + (1-w)rB] = w A + (1-w) B The variance your return on each dollar is Var[wrA + (1-w)rB] = w2 A2 + (1-w)2 B2 + 2w(1-w) AB A B The standard deviation is the square root. 6-43/46 Part 6: Bivariate Random Variables
Risk and Return: Example Suppose you know A, B, AB, A, and B (You have watched these stocks for over 6 years.) The mean and standard deviation are then just functions of w. I will then compute the mean and standard deviation for different values of w. For our Microsoft and Walmart example, A = .050071, B, = .021906 A = .114264, B,= .086035, AB = .248634 E[return] = w(.050071) + (1-w)(.021906) = .021906 + .028156w SD[return] = sqr[w2(.1142)+ (1-w)2(.0862) + 2w(1-w)(.249)(.114)(.086)] = sqr[.013w2 + .0074(1-w)2 + .000244w(1-w)] 6-44/46 Part 6: Bivariate Random Variables
W=1 W=0 For different values of w, risk = sqr[.013w2 + .0074(1-w)2 + .00244w(1-w)] is on the horizontal axis return = .02196 + .028156w is on the vertical axis. 6-45/46 Part 6: Bivariate Random Variables
Summary Random Variables Independent Conditional probabilities change with the values of dependent variables. Covariation and the covariance as a measure. (The regression) Correlation as a units free measure of covariation Math results Mean of a weighted sum Variance of a weighted sum Application to a portfolio problem. 6-46/46 Part 6: Bivariate Random Variables