
Understanding Poisson Distribution for Probability Approximation in Statistics
Explore the concept of Poisson Distribution and its application in approximating binomial situations with large trial numbers and small success probabilities. Discover how Poisson distribution proves to be a reliable method in various real-world scenarios and examples for statistical analysis.
Uploaded on | 0 Views
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
MAT 2572 Probability w/Statistics, Halleck Day 16 slides: 4.2 Poisson Distribution
Poisson Distribution Often in binomial situation, #trials n is large & probability of success p is small. Let =np & approximate binomial distribution with Poisson (new distribution): ? ? = ? ke ?! For fixed , approximation gets better as n gets large. We would like to 1. Prove that is in fact a probability distribution 2. Prove that approximates binomial distribution (for n large, p small) 3. Provide some binomial examples approximated by Poisson ,? = 0,1,2,
Poisson Distribution The following defines a probability distribution (for any > 0) ? ? = ? ke ,? = 0,1,2, ?! Proof (what kind of RV is this? What 2 things do we need to check?): Clearly the probability for each outcome is positive ?! ke k = e ?=0 ?!= e e = 1 Sum of probs = ?=0
Proof of Poisson approximation of binomial ? ? pk?? ? lim ? ? ? = ? = lim = lim ? ? = k ?!lim ? ? ? = k ? ? ?! /?k1 /?? ? ?!lim = ke ?! ? ? ? !(? )? ? ? !(? )?= ke ? ? ? ? ?! ?! lim ? ? ? ? !?? ?! [Last limit goes to 1: n(n 1) (n k +1) (n )(n ) (n )= (this is a finite product of ratios and each ratio goes to 1)] n *n 1 n n n k +1 n
Binomial example approximated by Poisson Every mile driven in a city has chance of 1 in 10,000 for a car accident as a result of someone else s careless driving. In a given year, you drive 2000 miles in the city without ever being careless. What is chance that you will have no accidents? Exactly 1 accident? More than 1 accident? Using binomial distribution P(0)=(9999/10000)2000 = 81.872% P(1)=2000(1/10000)(9999/10000)1999 = 16.376% P(>1) = 1 (81.9%+16.4%)=1.752% Using Poisson approx: =np= 2000(1/10,000)=1/5 & P(k)= ke /k! P(0)=e = e 0.2=81.873% P(1)= e = 16.375% P(>1) = 1 (81.9%+16.4%)=1.752%
More binomial examples approximated by Poisson. # of kids on a subway line on a particular day getting caught jumping turnstiles with no one around and no cameras. (1 in 1000?) # children born in NYC with Down s Syndrome in a given year to 30 year old mothers who are not tested (1 in 1000). #mistakes for a 1000 word essay (1 in 200). Number of people in a filled auditorium with a particular birth day (say June 21st, which is Poisson s!) Number of pieces of luggage lost by a frequent flyer over the course of 5 years (say 300 flights and every 200th checked bags is lost).
How Poisson can be used to model 1. Use data (given as a table) to calculate mean ? = ? ? ? ?=0 2. On same set of axes, graph the data and the Poisson distribution using for the parameter . 3. If the fit is pretty close (expect some discrepancy due to randomness), then you found a good model. [Perhaps the next step is to figure out why (is there a binomial process behind the curtains?)]
#of bags lost frequency Problem 4.2.12 0 1 2 3 4 9 13 10 Weekly luggage losses by commuter airline (see excel file) 1. Using raw data, create frequency table -> 2. Use table to get = 1.44 (~1 bags lost per wk) 3. Graph the data and the Poisson formula results on the same graph 4. It is pretty good fit, so we accept the model. 5. Next we ask: How are bags lost? Is there an underlying binomial process? 0.05 5 2 39 #bags lost/wk by small commuter airline 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.00 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 experimental poisson
Some criteria for Poisson 1. Do the events occur independently? 2. Does the average number of events that occurs during each subinterval of specific width stay constant over the sample space S?
Examples and non-examples of Poisson. Example Non-example why # of clicks in a Geiger counter where the time interval is much less than the half life # of clicks in a Geiger counter where the time interval is on the same order as the half life the chance of a click decreases over time as the radioactive part is lost #empty taxi cabs which pass a given corner late at night #busses arriving at a bus stop Busses tend to bunch up during rush hours and during non-rush hours, they are on a schedule #parties arriving at a restaurant during prime time (or during non prime time) #people arriving at a restaurant Especially during prime times, people tend to meet as a family, organization or friend unit #car accidents on a road under dry conditions at a certain hour of the day #car accidents in a stretch of road under all conditions Accident rate will depend on weather as well as the time of day and amount of traffic Exercise: come up with your own example and non-example!
Intervals Between Events: The Poisson/Exponential Relationship Poisson is a DISCRETE distribution. We are counting how many events happen in one interval. Here is what a typical time-line looks like: x f(x) 0 1 1 3 The data produced would be 1,1,2,1,0,3 for a frequency table: We can instead measure the TIME between events, indicated by the y s in the diagram: 1.6, 0.4, 0.6, 1.0, 1.4, 0.2, 0.4 2 1 3 1
Time between eruptions of Mauna Key (See Excel file) When graphing, do not use histogram for data. Instead, use DENSITY function: Divide each frequency by total #of data points and by width of each interval. Thus probability for an interval will correspond to the area under curve!
Exercise 4.2.27 Fatal commercial airplane crashes worldwide occur at rate of ~10 per yr. http://www.planecrashinfo.com/cause.htm 1. Give 2 reasons for assuming that such crashes are Poisson events and 2 reasons that question the use of Poisson. 2. What is the probability that four or more crashes will occur next year? 3. What is the probability that the next two crashes will occur within three months of one another?
Exercise 4.2.27 (cont.) Fatal commercial airplane crashes worldwide occur at rate of ~10 per yr. 1. Give 2 reasons for assuming that such crashes are Poisson events and 2 reasons that question the use of Poisson. Poisson: Rate does change over time, but this problem is calculating just what is happening within one year. Especially for the last 20 years, crashes are isolated events. Particular kinds of planes do not have vulnerabilities that result in crashes. (as was happened in previous decades)
Exercise 4.2.27 (cont.) Fatal commercial airplane crashes worldwide occur at rate of ~10 per yr. 1. Give 2 reasons for assuming that such crashes are Poisson events and 2 reasons that question the use of Poisson. Not Poisson: Rate does change within a year, especially if a country has areas of harsh weather in the winter. Also planes tend to travel full (and hence are more likely to crash) during holiday times. Planes on occasion will crash into each other (hence not independent).
Exercise 4.2.27 (cont.) Fatal commercial airplane crashes worldwide occur at rate of ~10 per yr. 2. What is the probability that 6 or more crashes will occur next year? This is a Poisson problem: P(X 6)=1 P(X=0, 1, 2, 3, 4 or 5) =1 e 10(1+10/1+102/2+103/6+104/24+105/120) = .9329 So there is a 93% chance that 6 or more crashes will occur next year. Here are the Excel commands: =1-EXP(-10)*(1+10/1+10^2/2+10^3/6+10^4/24+10^5/120) =1-POISSON.DIST(5,10,TRUE)
Exercise 4.2.27 (cont.) Fatal commercial airplane crashes worldwide occur at rate of ~10 per yr. 3. What is the probability that the next two crashes will occur within 25 days of each other? This is exponential. Let Y represent the time between the next 2 crashes. Tricky thing here is to remember to convert 25 days to year. 0 25/36510e 10t= e 10t = 1 ? 250/365=.4959 P(Y 25/365)= 0 ? = 25/365 Hence, there is 50-50 chance that the next 2 crashes will be within 25 days of each other.
Exercise 4.2.29: hybrid exponential/binomial 50 LED lights have just been installed in a subway station. The lights burn out at the rate of 1.1 per 100 months (or the average lifetime of a bulb is 100/1.1=90 months). What is the expected number of bulbs that will last for at least 75 months? Consider ith bulb, let Yi represent how long it lasts and Xi indicate whether it lasts the 75 months (Xi is 0 if not and 1 if yes). We use the exponential distribution with = 1.1/100, 1.1/100? (1.1/100)tdt=? 1.1/100 ? 75 P(Yi 75)= 75 ? = = ? 1.1/100 75=.44 The upshot is that p=P(Xi=1)=P(Yi 75)=.44
Exercise 4.2.29: hybrid expon/binom (cont.) 50 LED lights have just been installed in a subway station. The lights burn out at the rate of 1.1 per 100 months (or the average lifetime of a bulb is 100/1.1=90 months). What is the expected number of bulbs that will last for at least 75 months? Switching gears to use the binomial distribution X= Xi. n=50 bulbs each with a p=44% chance of lasting 75 months: E(X)=np=50(.44)=22. Hence, on average, 22 of the lights will still work after 75 months.