
Advanced Probability Concepts in Moments, Correlation, and Distributions
Explore advanced probability topics such as moments, correlation, probability models, entropy, and more. Learn about discrete and continuous probability distributions, joint probabilities, and expected values in a detailed manner.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
475 6. Advanced Probability Section 6.1 Moment and Correlation Section 6.2 Probability Model ( ) Section 6.3 Entropy Section 6.4 Kullback-Leibler Divergence (KL Divergence) Section 6.5 Basic Concepts of Random Process ( )
476 6.1 Moment and Correlation 6.1.1 Review for Probability 1. Discrete Case (a) Probability (Probability Mass Function, PMF) = number of cases where X total number of cases n ( ) n = = = P P X n X ( ) 1 n = P Note: X n (b) Joint Probability ( ) = = = , ( ) ( = ) P n m P X n and Y m , X Y number of cases where X total number of cases = n and Y m =
477 (c) Conditional Probability ( ) = = = | ( )|( ) P n m P X n Y m | X Y number of cases where X number of cases whereY = = n and Y = m = m ( ( ) ) , P n m ( ) , X Y P m = | P n m | X Y Y An Example of the Discrete Probability Distribution Binomial Distribution N n ( ) n N n n = (1 ) P p p for n = 0, 1, 2, , N X ( ) n = 0 P X Other examples are shown in Section 6.2.1
2. Continuous Case 478 (a) Cumulative Distribution Function (CDF) ( ) x ( ) = F Prob X x X (b) Probability Density Function (PDF) d dx ( ) x ( ) x ( ) x = f F X X x ( ) x dx = ( ) x ( ) x F f Note: = Prob Xf X x X X ( ) x dx = = lim x 1 F f X X ( ) ( ) + F x F x ( ) x = X X lim f X 2 0 ( ) + Prob x X x ( ) x = lim f X 2 0
2. Continuous Case 479 (c) Joint Cumulative Distribution Function ( ) , , X Y F x y Prob = ( ) ( ) ( ) X x and Y y (d) Joint Probability Density Function ( ) , , X Y f x y x y ( ) = , F x y , X Y (e) Conditional Probability ( ) ( ( ) y ) , F x y , f x y x y , X Y ( ) , X Y f = | f x y = | X Y dF dy ( ) y Y Y (f) Integral Probability ( ) x ( ) = , f f x y dy , X X Y ( ) y ( ) = , f f x y dx , Y X Y
480 Examples of the Probability Density Function in the Continuous Case (Quantization error is a special case of B = 0.5) Uniform Distribution ( ) x 1 B = when |x| < B Xf 2 1/(2B) ( ) x = otherwise 0 Xf -B B x Normal Distribution - < x < : mean : standard deviation 2 1 2 x 1 ( ) x = exp f X 2 when = 0, = 1 Other examples are shown in Section 6.2.2
481 3. Expected Value (discrete case) ( g n P X ( ) ) ( ) n = = = [ ] E g X n g n P X n n (continuous case) ( E g X ) ( ) ( ) ( ) x dx = g x f X It is usually written as ( ) ( ) ( ) ( ) x = E g X g x dF X
482 6.1.2 Moments Mean = ( ) ( ) n = E X nP (discrete case) X X n ( ) ( ) x dx = = E X x f (continuous case) X X Variance (var) ( ( ) ) ( ) n 2 2 = = ( ) ( ) var E X n P (discrete case) X X X X n ( ) x dx 2 2 = = ( ) ( ) var E X x f (continuous case) X X X X Standard Deviation (std) ( ( ) ) 2 = = ( ) var E X (discrete case) X X X 2 = = ( ) var E X (continuous case) X X X
483 [Example 1] ( ) n (1) P X 1 0.5 2 0.5 1.5 = + ( 0.5 1 1.5 X var = var = = + X ) ( ) 2 2 = = 0.5 2 1.5 0.25 0.5 X X ( ) n (2) P X = var = = 5 / 2 1.25 2.5 X X X
484 (1) Moment (raw form) = ( ) = ( ) n k k m E X n P (discrete case) k X n ( ) ( ) x dx = k k = (continuous case) m E X x f k X (2) Moment (central form) ( ( ) ) ( ) n k k = = ( ) ( ) m E X n P (discrete case) k X X X n ( ) x dx k k = = ( ) ( ) m E X x f (continuous case) k X X X
485 (3) Moment (standardized form) ( ) n k ( ) ( ) n P k ( ) E X X X m X = = = k k X n v ( ) k /2 /2 k k 2 ( ) E X ( ) n 2 ( ) n P X X X n (discrete case) ( ) ( ) x dx k k ( ) x f ( ) E X m X X X = = = k k X v ( ) k /2 /2 k k 2 ( ) x dx ( ) E X 2 ( ) x f X X X (continuous case)
486 Order Moment (raw) Moment (central) Moment (standardized) k = 1 Mean 0 0 k = 2 Variance 1 k = 3 Skewness k = 4 Kurtosis k = 5 Hyperskewness
487 Skewness ( ) It indicates the relative locations of the high-probability region and the long tail to the mean. ( ) ( ) X X n skewness 3 n P n ( ) x dx 3 ( ) x f X X = or 3/2 3/2 ( ) x dx ( ) n 2 ( ) x f 2 ( ) n P X X X X n skewness > 0: The high-probability region is in the left of the mean. skewness < 0: The high-probability region is in the right of the mean. skewness = 0: Symmetry
488 skewness = 0 mean = 0 skewness = 0.4992 mean = 0 skewness = -0.4992 mean = 0
489 Kurtosis ( ) It indicates how sharp the high-probability region is. ( ) n 4 ( ) n P ( ) x dx 4 X X ( ) x f = X X n kurtosis or 2 2 ( ) x dx ( ) x 2 2 ( ) x f ( ) x P X X X X x kurtosis 0 large kurtosis: The high probability region is sharp.
490 ( ) x e 2| | x = Xf kurtosis = 6 std = 0.707 1 2 ( ) x e | | x = Xf kurtosis = 5.86 std = 1.414 1 4 ( ) x e | |/2 x = Xf kurtosis = 4.38 std = 2.647
491 ( ) x 3 3/2 e | | /2 x = 0.396 Xf kurtosis = 2.4184 std = 0.864 1 2 ( ) x 2/2 x = Xf e kurtosis = 3 std = 1 1 ( ) x e | |/ 2 x = Xf 2 2 kurtosis = 5.2903 std = 1.9726
492 [Example 2] Suppose that X is uniformly distributed in x [0, 10]. Determine the central moments, the standardized moments, the variance, the skewness, and the kurtosis of X. ( ) 0 10 ( ) 0 x otherwise = 1 = 10 Xf x for x (Solution): 0 10 Xf 10 1 = = 5 x dx X 10 0 10 5 5 1 1 2 2 = = = ( 5)10 x dx x dx X 10 3 0 5 Central moment: 0 if k is odd 10 5 1 1 k k = = = ( 5) m x dx x dx k 5 k 10 10 if k is even 0 5 + 1 k
493 Standardized moment: 0 if k is odd m m /2 k = = = k k X k k 3 v /2 k 3 k k 5 if k is even + 1 25 3 = = var m Variance: 2 X v = 0 Skewness = 3 9 5 v = Kurtosis = 4
494 6.1.3 Correlation Covariance ( ) = ( )( ) cov E X Y , X Y X Y ( ) = ( )( ) , cov x y P x y (discrete case) , , X Y X Y X Y x y ( ) (continuous case) = ( )( ) , cov x y f x y dxdy , , X Y X Y X Y Note: (1) When X = Y, = cov var , X X X ( ) = cov E XY (2) , X Y X Y
495 Correlation ( ) cov ( )( ) E X Y , X Y = = X Y corr , X Y X Y X Y ( ) ( )( ) , x y P x y , X Y X Y x y = corr , X Y ( ) x ( ) y 2 2 ( ) ( ) x P y P X X Y Y x y (discrete case) ( ) ( )( ) , x y f x y dxdy , X Y X Y = corr , X Y ( ) x dx ( ) y dy 2 2 ( ) ( ) x f y f X X Y Y (continuous case)
496 ( ) ( )( ) E X Y = X Y corr , X Y X Y Note: (1) 1 1 corr , X Y (2) When Y = cX + d, c is a positive constant, d is a constant, = 1 corr , X Y (3) When Y = cX + d, c is a negative constant, d is a constant, = 1 corr , X Y (4) If Y is independent of X, ( ) ( ) ( E Y ) ( )( ) E X Y E X = = = X Y X X 0 corr , X Y X Y X Y
497 (i) Full Correlation: 0.9 corr , X Y (ii) High Correlation: 0.6 0.9 corr , X Y (iii) Middle Correlation: 0.3 0.6 corr , X Y (iv) Low Correlation: 0.3 corr , X Y (v) Positive Correlation: 0 corr , X Y (vi) Negative Correlation: 0 corr , X Y
498 [Example 3] Determine the Covariance and the Correlation of X and Y if x x 1 + 0 10, 5 x y ( ) 50 0 2 2 = , f x y , X Y otherwise y 10 (Solution): 5 (Step 1) First, calculate ( ) X f x /2 5 + ( ) x = 1 50 1 50 1 , f x y dy = = dy , X Y x 10 /2 x 10 ( ) min 10,2 y ( ) y ( ) max 0,2 = = , f f x y dx dx , Y X Y ( ) 10 y / 25 )/ 25 0 0 5 5 y for for otherwise y ( ) y = (10 10 f y y Y
499 (Step 2) 10 1 ( ) x dx = = = 5 xf xdx X X 10 0 2 2 10 y y y 5 10 ( ) y dy = = + = 5 yf dy dy Y Y 25 25 0 5 (Step 3) To determine the covariance, ( ) = ( )( ) , cov x y f x y dxdy , , X Y X Y X Y /2 5 + 10 x 1 50 1 50 25 6 = ( 5)( 5) x ( y dydx 0 /2 x ) 10 5 25 x = ( 5) x dx 2 0 =
500 (Step 4) To determine the correlation, first, we determine the variance 10 25 3 1 ( ) x dx 2 X 2 2 = = = ( ) ( 5) x f x dx X X 10 0 2 2 ( 5) ( 5) (10 25 2 5) 25 ) y y y y 5 10 ( ) y dy 2 Y 2 = = + ( ) y f dy dy Y Y 25 y 0 5 2 2 ( 5) y ( 5) ( y y y y 5 0 5 = + = 1 1 ( ) 2 dy dy dy 1 25 25 0 5 0 (set y1= 10-y) 25 6 = 5 5 = = , Therefore, X Y 3 6 (Step 5) cov 18 25 6 25 1 , X Y = = = = 0.707 corr , X Y 2 X Y (X and Y are highly correlated)
y 501 10 [Example 4] Note that if 1 ( ) 0 10 x y x ( ) 10 = , f x y , X Y 0 otherwise x 10 then 10 1 1 10 1 1 ( ) x ( ) ( ) y ( ) (page 341(1)) = = = = Xf x y dy Yf x y dx 10 10 10 10 10 0 0 10 1 1 = = = = 5 xdx 5 ydy X 10 Y 10 0 0 10 1 ( ) = ( 5)( 5)10 25 3 25 3 cov x y x y dydx (page 342(2)) , X Y 0 1 10 2 = = ( 5) x dx 10 0 10 25 3 1 2 10 = = 1 ( 5) y dy 2 = = ( 5) x dx Y 10 0 X 10 0 cov , X Y = = 1 corr , X Y X Y
502 [Example 5] If y 1 0 10 0 10 x and y ( ) 10 100 0 = , f x y , X Y otherwise then x 10 10 1 1 10 ( ) y 1 1 ( ) x = = = = Yf dx Xf dy 100 10 100 10 0 0 10 1 10 1 = = 5 ydy = = 5 xdx Y 10 X 10 0 0 (odd symmetry with respect to (5, 5)) 10 10 1 = = ( 5)( 5) 0 cov x y dydx , X Y 100 0 0 cov , X Y = = 0 corr , X Y X Y
503 6.2 Probability Model 6.2.1 Discrete Probability Model (1) Uniform Distribution Probability Mass Function (PMF) ( ) X P n N 1 = for n = a, a+1, , a+N-1 1 N = + a Mean: X 2 2 1 N standard deviation: = X 12 skewness = 0 a = 1, N = 6
504 (2) Binomial Distribution Probability Mass Function (PMF) N n ( ) n ( ) N n n = 1 P p p for n = 0, 1, , N X [Physical Meaning]: If we perform a trial N times and for each time the successful rate is p, then PX(n) is the probability where the number of successful trials is n. = Np Mean: X standard deviation: = (1 ) Np p X 1 2 Np p = skewness p = 1/2, N = 5 (1 ) p When N = 1, the binomial distribution is called the Bernoulli distribution.
505 (3) Geometric Distribution Probability Mass Function (PMF) ( ) ( 1 X P n p = ) 1 n for n = 1, 2, 3, p [Physical Meaning]: If each trial has the successful rate of p, then PX(n) is the probability where the first successful trial is the nthtrial. = 1/ p Mean: X 1 p standard deviation: = X 2 p 2 p p = skewness p = 0.4 1
506 (4) Hypergeometric Distribution Probability Mass Function (PMF) K n P n = [Physical Meaning]: Suppose that there are N balls in a set. There is a subset which contains K balls. If we choose m balls from the set, then PX(n) means the probability that n of the balls are chosen from the subset. N m n N m K for n = 0, 1, 2, , min(m, K) ( ) X = / mK N Mean: X ( )( ) mK N K N N m standard deviation: = X 2 ( 1) N 1( 2 )( )( K N 2 ) m N N K N = skewness N = 12, K = 8, m = 6 ( )( 2) mK N m N
507 (5) Poisson Distribution Probability Mass Function (PMF) n ( ) n = P e X ! n [Physical Meaning]: Suppose that, within a certain time interval, an event will occur times in average. Then, PX(n) indicates the probability that the event occurs n times within the time interval. = mean: X = standard deviation: X = 1/ skewness = 3
508 6.2.2 Continuous Probability Model (1) Uniform Distribution Probability Density Function (PDF) ( ) x = 1 ( ) x 0 Xf = otherwise Xf for a < x < b b a a b + = mean: X 2 b a = standard deviation: X 12 a b skewness = 0
509 (2) Exponential Distribution PDF: ( ) x = ( ) x for x < 0 for x 0 0 Xf x = Xf e 1 = mean: X 1 = standard deviation: X skewness = 2 = 1
510 (3) Normal Distribution (Gaussian Distribution) PDF: 2 ( ) x 1 ( ) x 2 = Xf e 2 2 = mean: X = standard deviation: X skewness = 0 = = 0, 1 The normal distribution is the most popular probability distribution. However, is it reasonable?
511 Confidence Level ( ): The confidence level is the probability where the data value is within some confidence interval ( ) ( ) = confidencelevel Prob a X b confidence interval ( ) b ( ) a = confidencelevel F F X X
512 Some confidence level for the normal distribution, = 68.2689% Prob X = 2 95.4500% Prob X = 3 99.7300% Prob X = 4 99.9937% Prob X = 5 99.99994% Prob X = 6 99.9999998% Prob X = 7 99.9999999997% Prob X 12 3 10 = 7 Prob X
513 (4) Laplace Distribution PDF: ( ) x x = Xf e 2 = 0 mean: X 2 = standard deviation: X = 1 skewness = 0
514 (5) Hyper-Laplacian Distribution PDF: 1 = C ( ) x x = Xf Ce where x 2 e dx 0 = 0 standard deviation: decreases with mean: X skewness = 0 = = 1, 2/3 = = 1, 1/ 2
515 (6) Log-Normal Distribution PDF: 2 (ln ) x u 1 ( ) x 2 = 2 Xf e where x > 0 ( u 2 x ) 2 = + mean: exp / 2 X standard deviation: 2 2 = + 1exp e u X 2 = = 1, 0 u ( ) 2 2 = + 1 2 skewness e e
516 (7) Rayleigh Distribution 2 PDF: x where x > 0 x ( ) x 2 = 2 f e X 2 = mean: X 2 standard deviation: 4 = X 2 ( ) 2 3 = skewness 3/2 (4 ) = 1
517 (8) Pareto Distribution PDF: x ( ) x ( ) x = = 0 + f otherwise 0 Xf when x > x0 where x0> 0, > 0 X 1 x x = when > 1 0 1 mean: X when 1 X x standard deviation: = 0 when > 2 when 2 X 1 2 = = 1, 3.5 x 0 X + 2 2 2 = skewness when > 3 3 skewness when 3
518 6.3 Entropy Discrete Case Entropy of X can be denoted by H(X) = ( ) n ( ) n ln Entropy P P X X n ( ) ( ) n = ln Entropy E P In fact, X Continuous Case ( ) x ( ) x dx = ln Entropy f f X X ( ) ( ) x = ln Entropy E f In fact, X
519 Note: (1) Since ( ) n ( ) x ln 0 ln 0 P Xf X we have Entropy 0 (2) In some literature, the entropy of X is denoted by ( H X ) ( ) n = , we can set ( ) X P 0 P (3) When X ( ) n = ln 0 n P X when calculating the entropy.
520 [Example 1] If ( ) 1 ( ) n = = 1, 0 P P otherwise X X then ( ) ( ) H X = = 1 ln 1 0 [Example 2] If ( ) 1 ( ) 2 ( ) n = = = 0.8, 0.2, 0 P P P otherwise X X X then ( ) ( ) ( ) H X = 0.8 ln 0.8 0.2 ln 0.2 = 0.5004 [Example 3] If ( ) 1 ( ) 2 ( ) n = = = 0.5, 0.5, 0 P P P otherwise X X X then ( ) ( ) ( ) ( ) H X = 0.5 ln 0.5 0.5 ln 0.5 = = ln 2 0.6931
521 [Example 4] If ( ) ( ) n ( ) H X = ( ) 2 ( ) 3 ( ) 4 = = = = = 1 0.7, 0 0.1, 0.1, 0.1, P P P P X X X X P otherwise X ( ) ( ) ( ) 0.7 ln 0.7 3 0.1 ln 0.1 = 0.9404 [Example 5] If ( ) 1 ( ) 2 ( ) 3 ( ) 4 ( ) n = = = = = 0.25, 0 P P P P P otherwise X X X X X ( ) ( ) ( ) H X = 4 0.25 ln 0.25 = 1.3863
522 Main Applications of Entropy (a) Thermodynamics ( ) (b) Information Theory less entropy = more meaningful information (c) Data Compression ( ) entropy = log the number of bits for each input 2 (d) Optimization, Classification, Machine Learning
523 6.4 Kullback-Leibler Divergence 6.4.1 Definition The Kullback-Leibler divergence (KL divergence, KL ) is to determine the difference of two probability distributions. In the discrete case, suppose that there are two probability distribution PX(n) and PY(n). Then the KL divergence from PY(n) to PX(n) is ( ) ( ) || KL X n ( ) ( ) Y P n ( ) , 0 X Y L n = if PX(n) = 0 = ( ) n D X Y P n L Ture Approximated probability model , X Y probability P n ( ) n = X ln L if PX(n) 0 where , X Y
( ) ( ) 524 P P n n = ( ) ( ) n L ( ) n ( ) n || D X Y P = X if PX(n) 0 ln L , KL X X Y , X Y n Y ( ) n = if PX(n) = 0 0 L , X Y Note: (1) If PX(n) = PY(n) for all n, then D ( ) X Y = || 0 KL (2) If it exists some n such that PY(n) = 0 but PX(n) 0, then ( ) || KL D X Y (3) In fact, ( ) ( ) n ( ) ( ) = || ln D X Y P P n H X KL X Y n (4) In usual, X is the true probability and Y is the probability model.