Frequency Distributions and Histograms for Data Organization

organizing data n.w
1 / 29
Embed
Share

Learn how to organize data effectively through frequency distributions and histograms. Explore examples, solutions, and practical applications for creating frequency tables. Enhance your data analysis skills efficiently.

  • Data Organization
  • Frequency Distributions
  • Histograms
  • Data Analysis
  • Frequency Tables

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Organizing Data

  2. Frequency Distributions, Histograms, and Related Topics

  3. Frequency Tables

  4. Frequency Table When we have a large set of quantitative data, it s useful to organize it into smaller intervals or classes and count how many data values fall into each class. A frequency table does just that.

  5. Example 1 Frequency table A task force to encourage car pooling did a study of one-way commuting distances of workers in the downtown Dallas area. A random sample of 60 of these workers was taken. The commuting distances of the workers in the sample are given in Table 2-1. Make a frequency table for these data using 6 classes. One-Way Commuting Distances (in Miles) for 60 Workers in Downtown Dallas Table 2-1

  6. Example 1 Solution a. First decide the number of classes. 5 to15 classes are usually used. If you use fewer than five classes, you risk losing too much information. If you use more than 15 classes, the data may not be sufficiently summarized. The spread of the data and the purpose of the frequency table are the guides when selecting the number of classes. In the case of the commuting data, we use six classes. b. Next, find the class width.

  7. Example 1 Solution To find the class width, we observe that the largest distance commuted is 47 miles and the smallest is 1 mile. Using 6 classes, cont d Class width = = (increase to 8) c. Create the distinct classes. Determine the class limit for each class. The lower class limit is the lowest data value that can fit in a class. The upper class limit is the highest data value that can fit in a class.

  8. Example 1 Solution We use the convention that the lower class limit of the first class is the smallest data value. Add the class width tothis number to get the lower class limit of the next class. The smallest distance in the sample is 1 mile. So the lower class limit of the first class is 1. Since the class width is 8, we add 8 to 1 to find that the lower class limit for the second class is 9. cont d Following this pattern, we establish all the lower class limits. Then we fill in the upper class limits. upper class limit = the lower class limit of that class + class width 1

  9. Example 1 Solution Table 2-2, shows the upper and lower class limits for the data. cont d Table2-2 Frequency Table of One-Way Commuting Distances for 60 Downtown Dallas Workers (Data in Miles) Note: The Class Width is what you add to get from one row to the next for Class Limits, Class Boundaries, and Class Midpoints. Class width is the distance between lower/upper class limits (or lower/upper class boundaries, or class midpoints) of consecutive classes.

  10. Example 1 Solution d. Next, tally the data into the six classes and find the frequency for each class. cont d Procedure: Table 2-2 shows the tally and frequency of each class.

  11. Example 1 Solution e. Find the midpoints. The center of each class is called the midpoint (or class mark). The midpoint is often used as a representative value of the entire class. cont d The midpoint is found by adding the lower and upper class limits of one class and dividing by 2. + Lower class limit Upper class limit = Midpoint 2 Table 2-2 shows the class midpoints.

  12. Example 1 Solution f. Find the class boundaries. There is a space between the upper limit of one class and the lower limit of the next class. The halfway points of these intervals are called class boundaries. cont d Procedure: How to find class boundaries(integer data) To find upper class boundaries, add 0.5 unit to the upper class limit. To find lower class boundaries, subtract 0.5 unit to the lower class limit.

  13. Frequency Table Procedure:

  14. Frequency Histograms

  15. Frequency Histograms Frequency Histograms provides effective visual displays of data organized into frequency tables. In these graphs, we use bars to represent each class, where the width of the bar is the class width. For histograms, the height of the bar is the class frequency, whereas for relative-frequency histograms, the height of the bar is the relative frequency of that class.

  16. Example 2 Frequency Histogram Make a histogram and a relative-frequency histogram with six bars for the data in Table 2-1 showing one-way commuting distances. One-Way Commuting Distances (in Miles) for 60 Workers in Downtown Dallas Table 2-1

  17. Example 2 Solution The first step is to make a frequency table and a relative-frequency table. Frequency Table of One-Way Commuting Distances for 60 Downtown Dallas Workers (Data in Miles) Table2-2

  18. Example 2 Solution cont d Class Limits Lower - Upper Lower - Upper 1 8 9 16 17 24 25 32 33 40 41 48 Class boundaries Class Midpoint 4.5 12.5 20.5 28.5 36.5 44.5 Frequency 0.5 8.5 8.5 16.5 16.5 24.5 24.5 32.5 32.5 40.5 40.5 48.5 14 21 11 6 4 4 Frequencies of One-Way Commuting Distances

  19. Example 2 Solution Figures 2-2 and 2-3 show the histogram and relative-frequency histogram. In both graphs, class boundaries are marked on the horizontal axis cont d Histogram for Dallas Commuters: One-Way Commuting Distances Figure 2-2 For a histogram, the height of each bar is the corresponding class frequency.

  20. Example 3 Find the mean, median, and mode of the data represented in the following frequency table. For the mean, compute the midpoint of each interval, multiply each by the number Of data points in the interval, sum, and divide by the number of data points: Class Limits Lower - Upper 1 8 9 16 17 24 25 32 33 40 41 48 Freq. midpt 14 21 11 6 4 4 4.5 12.5 20.5 28.5 36.5 44.5 ???? = 4.5 14+12.5 21+20.5 11+28.5 6+36.5 4+44.5 4 14+21+11+6+4+4 =1046 60= 17.43

  21. Example 3 (continued) As there are 60 data points, the median will be the average of 30th and 31st items. From reading the table, we can see that 30th and 31st items are both in interval 9-16. The midpoint of this class is 12.5. Class Limits Lower - Upper 1 8 9 16 17 24 25 32 33 40 41 48 Freq. midpt 14 21 11 6 4 4 4.5 12.5 20.5 28.5 36.5 44.5 Thus the median here is 12.5. The modal class is the one with the most data points. In this problem, the second Interval 9-16 has the largest frequency. We take the midpoint of that interval for the mode, so the mode is 12.5. Note: In this problem, the median and the mode happen to be the same value. But for most problems, The median and mode are different.

  22. Practice Problem #1 The following franchise fee data (in $1000) was collected from 30 randomly selected franchises. Using 5 classes, compute the class width, and complete the table. Construct a frequency histogram of the data. Label and scale the axes; include a title. 10 15 15 18 20 20 20 21 24 25 23 25 26 25 24 30 35 38 26 19 40 44 37 17 24 21 32 38 41 20

  23. Practice Problem #1 Class width = 44 10 = 34 5 = 6.8. So the class width is 7. 5 Class Midpoint Class Limits Lower - Upper 10 - 16 Class Boundaries Lower -Upper 9.5 - 16.5 Tally Frequency 13 ||| 3 17 - 23 16.5 - 23.5 20 |||| |||| 10 24 - 30 23.5 - 30.5 27 |||| |||| 9 31 - 37 30.5 - 37.5 34 ||| 3 38 - 44 37.5 - 44.5 41 |||| 5 Total: 30 1.00 Place class boundaries on the horizontal axis and frequency on the vertical axis to make a histogram/relative histogram.

  24. Distribution Shapes

  25. Distribution Shapes Histograms are valuable and useful tools. If the raw data came from a random sample of population values, the histogram constructed from the sample values should have a distribution shape that is reasonably similar to that of the population. Several terms are commonly used to describe histograms and their associated population distributions.

  26. Distribution Shapes a) Mounded-shaped symmetrical: This term refers to a histogram in which both sides are(more or less) the same when the graph is folded vertically down the middle. See Figure (a). The distribution of IQ scores across the general population is roughly symmetric with a center mound 100 and 2 tails.

  27. Distribution Shapes b) Uniform or rectangular: These terms refer to a histogram in which every class has equal frequency. From one point of view, a uniform distribution is symmetrical with the added property that the bars are of the same height. See Figure (b).

  28. Distribution Shapes c) Skewed left or skewed right: These terms refer to a histogram in which one tail is stretched out longer than the other. The direction of skewness is on the side of the longer tail. So, if the longer tail is on the left, we say the histogram is skewed to the left. See Figure (c). The distribution has its peak at one side and a long tail to the other side. Skewed left: The distribution of students scores on an easy exam because few students will fail, while most of them will do very well. Skewed right: the distribution of income . Most people have an average income, while few are very rich.

  29. Distribution Shapes d) Bimodal: This term refers to a histogram in which two classes with the largest frequencies are separated by at least one class. The top two frequencies of these classes may have slightly different values. This type of situation sometimes indicated that we are sampling from two different populations. See Figure (d). This is usually a mixture of two different populations, such as the distribution of heights of people for men and women, so if we don t separate the gender, there will tend to be a bimodal effect.

Related


More Related Content