
Understanding Data Classification and Frequency Distribution
Learn about the importance of classifying data and creating frequency distributions to effectively analyze and present information. Explore how to arrange data, identify common types of classification, and construct frequency distribution tables following basic rules.
Uploaded on | 1 Views
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Unit II Descriptive Statistics: Descriptive Statistics: Measures Of Central Tendency Measures Of Central Tendency CO2: Identify the use of appropriate statistical terms to describe data Note: The material to prepare this Presentation and Notes has been taken from internet, books and are generated only for students reference and not for commercial use.
CLASSIFICATION OF DATA CLASSIFICATION OF DATA After the data has been systematically collected and edited, the first step in presentation of data is classification. Classification is the process of arranging the data according to the points of similarities and dissimilarities. It is like the process of sorting the mail in a post office where the mail for different destinations is placed in different compartments after it has been carefully sorted cut from the huge heap.
OBJECTIVES OF CLASSIFICATION To condense the mass of data in such a way that salient features can be readily noticed To facilitate comparisons between attributes of variables To prepare data which can be presented in tabular form To highlight the significant features of the data at a glance Some common types of classification are: 1 Geographical: according to area or region. 2 Chronological: according to occurrence of an event in time. 3 Qualitative: according to attributes. 4 Quantitative: according to magnitudes.
Frequency Distribution: When the data is arranged into groups or categories according to conveniently established divisions of the range of the observations, such an arrangement in tabular form is called a frequency distribution. Frequency distribution shows the number of cases falling within a given class interval or range of scores. A frequency distribution is a table that organises data into classes, i.e., into groups of values describing one characteristic of the data. In a frequency distribution, raw data is represented by distinct groups which are known as Classes. The number of observations that fall into each of the classes is known as Frequency. When data is described by a continuous variable it is called Continuous data and when it is described by a discrete variable, it is called Discrete data.
Follow these basic rules when constructing a frequency distribution table for a data set that contains a large number of observations: 1. Find the smallest & Largest observations/values. 2. Prepare a column of all possible values of variables from largest to smallest. 3. In next column put a tally mark against values to which it relates. 4. Count the tally marks and place them in next column in front of corresponding values.
Example: The following are the results of a monthly test in which 25 students received a total of 20 points. Make a discrete frequency distribution of the data. 15, 13, 16, 16, 15, 16, 17, 14, 15, 16, 16, 17, 14, 15, 16, 16, 17, 14, 15, 16, 16, 17, 14, 15, 16, 16, 17, 14, 15, 16, 16, 17, 14, 16, 15, 17, 13
Ans Ans- - Given : Given : 17, 16, 14, 15, 16, 17, 14, 16, 15, 17, 13,15, 13, 16, 16, 15, 16, 17, 14, 15, 16, 16,17, 14, 17 17, 16, 14, 15, 16, 17, 14, 16, 15, 17, 13,15, 13, 16, 16, 15, 16, 17, 14, 15, 16, 16,17, 14, 17
Eg. The weekly wages in Rs. paid to the workers are given below. Form a discrete frequency distribution: 300, 240, 240, 150, 120 ,240, 120, 120, 150, 150, 150, 240, 150, 150, 120, 300, 120, 150, 240, 150, 150, 120, 240, 150, 240, 150, 120, 120, 240, 150.
Eg. consider a sample study in which 50 families were surveyed to find the number of children per family. The data obtained are: 3 2 2 1 3 4 2 1 3 4 5 0 2 1 2 3 3 2 1 1 2 3 0 3 2 1 4 3 5 5 4 3 6 5 4 3 1 0 6 5 4 3 1 2 0 1 2 3 4 5
Eg. For the following raw data, form a discrete frequency distribution: 30,32,32,38,34,32,30,30,32,34,30,32,32,28,32,30,28,30,32, 32,30,28 and 30.
Frequency Distribution of Continuous Variables A continuous frequency distribution is a series in which the data are classified into different class intervals without gaps and their respective frequencies are assigned as per the class intervals and class width. Step 1: Determine the range of the data set. Step 2: Divide the range by the number of the classes that we want our data in and then round up. Step 3: Create class intervals using class width. Step 4: Obtain the frequency for each class.
Eg. Consider the following data. 17, 30, 37, 34, 39, 32, 30, 35, 12, 14, 12, 14, 14, 0, 25, 25, 25, 28, 47, 42, 49, 49, 45, 49, 46, 41, 60, 64, 62, 40, 43, 48, 48, 49, 49, 40, 41, 59, 51, 53, 82, 80, 85, 90, 98, 90,56, 55, 57, 55, 10, 14, 51, 50, 56, 70, 75, 64, 60, 66, 69, 62, 61, 70, 76, 70, 59, 56, 59, 57, 59, 55, 20, 22, 56, 51,55, 56, 55, 50, 54, 66, 69, 64, 66, 60, 65, 62, 45, 47, 44, 40, 44, 65, 66, 65, 71, 82, 82, 90
Step 1: Determine the range of the data set. Maximum value = 98 Minimum value = 0 Range = Maximum value Minimum value = 98 0 = 98 Step 2: Divide the range by the number of the classes that we want our data in and then round up. Let the number of classes be 10. Class width = 98/10 = 9.8 Thus, we can consider 10 as the class size.
Step 3: Create class intervals using class width. For the above data, exclusive class intervals can be created and avoid taking the value which is equal to the upper limit of the class while writing the frequencies: 0 10, 10 20, 20 30, 30 40, 40 50, 50 60, 60 70, 70 80, 80 90, 90 100. Step 4: Obtain the frequency for each class.
Eg. Prepare a frequency distribution by inclusive method taking a class interval of 7 from the following data: 28, 17, 15, 22, 29, 21, 23, 27, 18, 12, 7, 9, 4, 1, 8, 3, 10, 5, 33, 27, 21, 15, 3, 36, 27, 18, 9, 2, 4, 6, 32, 31, 29, 18, 14, 13, 15, 11, 9, 7, 1, 37, 32, 28, 26, 24, 20, 19, 2 2, 4, 20, 16, 12, 8, 5, 5, 19, 20, 6, 9.
Solution: For the given data: Range = Maximum value Minimum value = 37 1 = 36 Number of classes = 36/7 = 5.1 {since the class size is 7 as per the given} Thus, we can define 5 classes. The inclusive class intervals can be written as: 0 7, 8 15, 16 23, 24 31, 32 39
Eg. Let us consider the following example regarding daily maximum temperatures in a city for 50 days. 28, 28, 31, 29, 35, 33, 28, 31, 34, 29, 25, 27, 29, 33, 30, 31, 32, 26, 26, 21, 21, 20, 22, 24, 28, 30, 34, 33, 35, 29, 23, 21, 20, 19, 19, 18, 19, 17, 20, 19, 18, 18, 19, 27, 17, 18, 20, 21, 18, 19.
Sol: Minimum Value= 17 Maximum Value=35 Range=35-17=18 Number of classes=5 (say) width of each class= 4
Graphic Representation of Frequency Distribution 1. Histogram 2. Frequency Polygon Histogram The histogram is a graph that displays the data by using contiguous vertical bars (unless the frequency of a class is 0) of various heights to represent the frequencies of the classes. When the data are classified based on the class intervals it can be represented by a histogram. Histograms are similar to bar diagram without any gap in between. There is no gap between the bars, since the classes are continuous. The bars are drawn only in outline without colouring or marking as in the case of simple bar diagrams. It is the suitable form to represent a frequency distribution.
An inclusive method: is one in which there is generally a difference between the upper limit of one class interval and the lower limit of the other class interval. For example, 0-9, 10-19, 20-29 are inclusive classes because it includes 9, 19, 29, etc. An exclusive method: is one in which there is generally no difference between the upper limit of one class interval and the lower limit of the other class interval. For example, 0-10, 10-20, 20-30 are examples of exclusive classes because 10, 20, 30 are not included in the classes where these are upper limits.
Eg. Draw a Histogram for the following distribution giving the marks obtained by 60 students of a class in a college . Marks : No. of students: 3 20-24 25-29 30-34 35-39 40-44 45-49 50-54 5 12 18 14 6 2
Solution: Here class intervals given are of inclusive type . State the upper limit of a class is not equal to the lower limit of its following class , the class boundaries will have to be determined. Marks : 19.5-24.5 24.5-29.5 29.5-34.5 34.5-39.5 39.5-44.5 44.5-49.5 49.5-54.5 No. of 3 5 12 18 14 6 2 Students
Eg. Construct a histogram to represent the data shown for the record high temperatures for each of the 50 states.
Frequency Polygon Frequency polygons are essentially a graphical representation of the distribution of data which helps in properly understanding the given information through a particular shape. Histograms and frequency polygons are similar in many aspects. However, frequency polygons are more useful and helpful at the time of comparing between two sets of data. Frequency polygon is a statistical tool equivalent to a Histogram that is used to represent and compare data when they are given in the form of cumulative frequency. Karl Pearson, an English statistician, made the initial presentation of it in the late 19th century.
Step 1: Mark the class intervals for each class on an x-axis while we plot the curve on the y-axis. Step 2: Calculate the midpoint of each of the class intervals which is the classmarks. Step 3: Once the classmarks are obtained, mark them on the x-axis. Step 4: Since the height always depicts the frequency, plot the frequency according to each class mark. It should be plotted against the classmark itself and not on the upper or lower limit. Step 5: Once the points are marked, join them with a line segment similar to a line graph. Step 6: The curve that is obtained by this line segment is the frequency polygon.
Frequency Polygons Midpoint Class Mark (Midpoint) = (Upper Limit + Lower Limit) / 2
Example 1: Construct a frequency polygon without a histogram using the data given below.
Sol: Class interval = (59.5 + 49.5)/2 = 54.5 (69.5 + 59.5)/2 = 64.5 (79.5 + 69.5)/2 = 74.5 (89.5 + 79.5)/2 = 84.5 (99.5 + 89.5)/2 = 94.5
Example 2: In a city, the weekly observations made in a study on the cost of a living index are given in the following table: Draw a frequency polygon for the data below with a histogram.
Solution: Classmark = (Upper Limit + Lower Limit) / 2 Classmark = (150 + 140)/2 = 145, (160 + 150)/2 = 155 and so on...
Advantages of Graphical representation of Frequency Distribution The advantages of graphic representation of data are as follows: The data can be presented in a more attractive and appealing form. It provides a more lasting effect on the brain. Comparative analysis and interpretation may be effectively and easily made. Various valuable statistics like median, mode and quartiles may be easily computed. Such representation may help in the proper estimation, evaluation and interpretation of the characteristics of items and individuals. Graphical representation helps in forecasting, as it indicates the trend of the data in the past.
Limitations of Graphical representation of Frequency Distribution They are biased Even with large samples, they are not minimum variance (i.e., most precise) estimates. Graphical methods do not give confidence intervals for the parameters (intervals generated by a regression program for this kind of data are incorrect). Formal statistical tests about model fit or parameter values cannot be performed with graphical methods.
Central Tendency Central Tendencies in Statistics are the numerical values that are used to represent mid-value or central value a large collection of numerical data. These obtained numerical values are called central or average values in Statistics. A central or average value of any statistical data or series is the value of that variable that is representative of the entire data or its associated frequency distribution. Such a value is of great significance because it depicts the nature or characteristics of the entire data, which is otherwise very difficult to observe.
REQUISITES OF A GOOD AVERAGE OR MEASURE OF CENTRAL TENDENCY It should be rigidly defined. It should be easy to understand and calculate. It should be based on all the observations. It should be suitable for further mathematical treatment. It should be affected as little as possible by fluctuations of sampling. It should not be affected much by extreme observations.
Measures of Central Tendency Mean Median Mode
Mean Mean Mean in general terms is used for the arithmetic mean of the data, but other than the arithmetic mean there are geometric mean and harmonic mean as well that are calculated using different formulas. Mean for Ungrouped Data Arithmetic mean( ) is defined as the sum of the individual observations (xi) divided by the total number of observations N. In other words, the mean is given by the sum of all observations divided by the total number of observations. Mean = Sum of all Observations Total number of Observations
Example: If there are 5 observations, which are 27, 11, 17, 19, and 21 then the mean ( ) is given by Sol: = (27 + 11 + 17 + 19 + 21) 5 = 95 5 = 19
Mean for Grouped Data Mean ( ) is defined for the grouped data as the sum of the product of observations (xi) and their corresponding frequencies (fi) divided by the sum of all the frequencies (fi).
Example: If the values (xi) of the observations and their frequencies (fi) are given as follows: = (4 5 + 6 10 + 15 8 + 10 7 + 9 10) (5 + 10 + 8 + 7 + 10) = (20 + 60 + 120 + 70 + 90) 40 = 360 40 = 9