Distributions and Outliers in Magnitude Data

lecture 6 n.w
1 / 21
Embed
Share

Explore variance calculation, distribution comparisons, and outlier detection in magnitude data from South America and North America using stem-and-leaf displays, boxplots, and more. Learn how to identify and address outliers effectively.

  • Distributions
  • Outliers
  • Magnitude Data
  • Boxplots
  • Variance

Uploaded on | 2 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Lecture 6

  2. So, what is variance? To avoid confusion, I decided to use formula from the book, ?2= ?? ?2 ? 1 However, I will call it the corrected variance .

  3. Understanding and comparing distributions

  4. Magnitudes only South America: 6.7 8.2 7.6 5.1 4.9 7.1 8.3 5.3 6.9 7.6 7.6 Median = 7.1 Mean = 6.84 North America: 5.1 7.2 6.4 7.9 6.9 6.1 6.3 6.0 6.9 Median = 6.4 Mean = 6.53

  5. South America 3.5 3 North America 2.5 2.5 2 2 1.5 1.5 1 1 0.5 0.5 0 0 [4.9, 5.2] (5.2, 5.5] (5.5, 5.8] (5.8, 6.1] (6.1, 6.4] (6.4, 6.7] (6.7, 7] The distribution for North America is more symmetric; also in South America a typical magnitude is higher (7, 7.3] (7.3, 7.6] (7.6, 7.9] (7.9, 8.2] (8.2, 8.5] [5.1, 5.4] (5.4, 5.7] (5.7, 6] (6, 6.3] (6.3, 6.6] (6.6, 6.9] (6.9, 7.2] (7.2, 7.5] (7.5, 7.8] (7.8, 8.1]

  6. Stem-and-leaf display South America North America 9 4 13 5 1 79 6 013499 1666 7 29 23 8 We see again that in NA the typical magnitude is 6*, and in SA magnitudes are in general higher.

  7. Boxplots South America North 9 9 8.5 8.5 8 8 7.5 7.5 7 7 6.5 6.5 6 6 5.5 5.5 5 5 4.5 4.5 4 4 1 1

  8. More boxplots: NA years 2010-2015 8 7 6 5 4 3 2 1

  9. Outliers An outlier is a value that does not fit in the data. However, there is no exact definition for does not fit Boxplots provide one rule: anything that is greater than UQ+1.5xIQR or smaller than LQ-1.5xIQR is an outlier.

  10. So what should we do? One thing we can do is correct the value, if we are sure how. For example, if someone claims to be born in 20009, then most probably it s 2009 Second good idea is to study two datas: one with the identified outlier, and one without

  11. Timeplots Suppose we have a lot of data. Say, some number for every day. Before we summarized it in a histogram. But in this case we lost the dynamics : we do not see how things changed day- by-day Also, we might want to see what happened every day, not in the average

  12. Timeplots Timeplots are good if we have continuous time . We would not do it, say, for a 100 vehicle models and their mpg s. We would do it in the day-by-day situation. ( year-by-year too)

  13. Daily temperature in Nashville (in F) January February March April May June July August September October November December 37 37 37 37 37 37 37 37 37 36 36 36 36 36 36 36 36 36 37 37 37 37 37 37 37 37 37 37 38 38 38 38 38 38 39 39 39 39 40 40 40 40 40 41 41 41 42 42 42 42 43 43 43 44 44 44 44 45 45 45 46 46 46 47 47 47 48 48 48 49 49 49 50 50 50 51 51 51 51 52 52 52 53 53 53 53 54 54 54 54 55 55 55 55 56 56 56 56 57 57 57 58 58 58 58 59 59 59 59 60 60 60 61 61 61 61 62 62 62 62 63 63 63 64 64 64 64 65 65 65 66 66 66 67 67 67 67 68 68 68 69 69 69 69 70 70 70 71 71 71 71 72 72 72 72 73 73 73 74 74 74 74 74 75 75 75 75 76 76 76 76 76 77 77 77 77 77 77 78 78 78 78 78 78 78 78 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 80 80 80 80 80 80 80 80 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 79 78 78 78 78 78 78 78 77 77 77 77 77 77 76 76 76 76 76 75 75 75 75 75 74 74 74 74 73 73 73 72 72 72 71 71 71 70 70 70 69 69 69 68 68 67 67 67 66 66 65 65 65 64 64 63 63 63 62 62 61 61 61 60 60 59 59 59 58 58 58 57 57 56 56 56 55 55 55 54 54 54 53 53 53 52 52 52 51 51 51 50 50 50 49 49 49 48 48 48 48 47 47 47 46 46 46 45 45 45 44 44 44 43 43 43 43 42 42 42 42 41 41 41 41 40 40 40 40 39 39 39 39 39 38 38 38 38 38 38 37

  14. It was not observable Computing an average does not make too much sense: who cares about the average over a year?? First, let s compute the monthly averages.

  15. January February March April May June July August September October November December 36.8 41.3 50.1 58.5 67.1 75.1 79.1 77.9 71.3 59.9 49.3 40.5 That s already a bit better. Now, we want to picture it. One option is the timeplot .

  16. A timeplot of monthly averages 90 80 70 60 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12

  17. Chart Title 90 80 70 60 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12 Here the connecting segments just show us the direction: does the temperature increase or decrease? We see the clear pattern here, which we could predict from common sense

  18. A timeplot for daily temperatures 90 80 70 60 50 40 30 1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103 109 115 121 127 133 139 145 151 157 163 169 175 181 187 193 199 205 211 217 223 229 235 241 247 253 259 265 271 277 283 289 295 301 307 313 319 325 331 337 343 349 355 361

  19. 90 80 70 60 50 40 30 1 14 27 40 53 66 79 92 105 118 131 144 157 170 183 196 209 222 235 248 261 274 287 300 313 326 339 352 365 One particularly interesting thing we see from the timeplot is that the daily temperatures first only increase, then only decrease; with very few exceptions! We would not get it from a histogram or a boxplot

  20. Warning On page 98 it is written that Excel cannot make boxplots . This is a lie: Excel can make boxplots very well.

  21. Homework Read Chapter 4. Be careful with Section 4.5: it is a little vague. Do pp. 99+: 2,4 (write a few sentences about similarities, differences and patterns), 16, 18a, 19ade, 22bc, 25, 30abc, 31

More Related Content