machine learning
Explore the fundamentals of machine learning - from different types and algorithms like decision trees to applications in supervised and unsupervised learning. Understand how computers learn without explicit programming through examples and definitions. Dive into the world of decision tree induction and explore real-life examples of how it is used in classifying data.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Chapter -5 machine learning 1 3/3/2025 Dr vasu pinnti ICT
contents 2 What is machine learning Types of machine learning Applications of machine learning Supervised learning(classification) Decision tree algorithm Bayesian classification algorithm Unsupervised learning(clustering) K-means clustering algorithm 3/3/2025 Dr vasu pinnti ICT
What is machine learning? 3 Machine learning: "Field of study that gives computers the ability to learn without being explicitly programmed" 3/3/2025 Dr vasu pinnti ICT
types of learning algorithms 4 o Supervised learning it;s new found knowledge to do it o Unsupervised learning this to determine structure and patterns in data o Reinforcement learning Teach the computer how to do something, then let it use Let the computer learn how to do something, and use 3/3/2025 Dr vasu pinnti ICT
Supervised vs unsupervised 5 e.g: decision tree , Bayesian classification algorithms e.g: k-means clustering algorithm 3/3/2025 Dr vasu pinnti ICT
Applications of machine learning 6 3/3/2025 Dr vasu pinnti ICT
Decision tree(DT) 7 A decision tree has three types of nodes: A root node that has no incoming edges and zero or more outgoing edges. Internal nodes, each of which has exactly one incoming edge and two or more outgoing edges. Leaf or terminal node, each of which has exactly one incoming and no outgoing edges. Solving the classification problem using DT is a two- step process: 1) Decision Tree Induction- Construct a DT using training data 2) For each ti D, apply the DT to determine its class o o o 3/3/2025 Dr vasu pinnti ICT
Decision tree(DT)- example 8 Splitting Attributes Tid Refund Marital Status Taxable Income Cheat Refund No 1 Yes Single 125K Yes No No 2 No Married 100K NO MarStatus No 3 No Single 70K Married No 4 Yes Married 120K Single, Divorced Yes 5 No Divorced 95K TaxIncome NO No 6 No Married 60K < 80K > 80K No 7 Yes Divorced 220K Yes 8 No Single 85K YES NO No 9 No Married 75K Yes 10 No Single 90K Model: Decision Tree 10 Training Data 3/3/2025 Dr vasu pinnti ICT
Decision tree(DT) example 9 Single, Divorced MarSt Married Tid Refund Marital Status Taxable Income Cheat NO Refun No 1 Yes Single 125K No d Yes No 2 No Married 100K NO TaxInc No 3 No Single 70K < 80K > 80K No 4 Yes Married 120K Yes 5 No Divorced 95K YES NO No 6 No Married 60K No 7 Yes Divorced 220K Yes 8 No Single 85K There could be more than one tree that fits the same data! No 9 No Married 75K Yes 10 No Single 90K 10 3/3/2025 Dr vasu pinnti ICT
Decision Tree An Example (contd..) 10 Apply Model to Test Data Test Data Start from the root of tree. Refund Marital Taxable Income Cheat Status No Married 80K ? Refund 10 Yes No NO MarSt Married Single, Divorced TaxInc NO < 80K > 80K YES NO 3/3/2025 Dr vasu pinnti ICT
Decision Tree An Example (contd..) 11 Test Data Apply Model to Test Data Refund Marital Taxable Income Cheat Status No Married 80K ? Refund 10 Yes No NO MarSt Assign Cheat to No Married Single, Divorced TaxInc NO < 80K > 80K YES NO 3/3/2025 Dr vasu pinnti ICT
Decision tree another example 12 Training dataset for buys_computer example age income student high high credit_rating fair excellent fair fair fair excellent excellent fair fair fair excellent excellent fair excellent buys_computer no no yes yes yes no yes no yes yes yes yes yes no <=30 <=30 31 40 high >40 >40 >40 31 40 low <=30 <=30 >40 <=30 31 40 medium 31 40 high >40 no no no no yes yes yes no yes yes yes no yes no medium low low medium low medium medium medium 3/3/2025 Dr vasu pinnti ICT
Decision tree another example 13 Output: A Decision Tree for buys_computer age? <=30 overcast 31..40 >40 student? credit rating? yes excellent fair no yes yes no yes 3/3/2025 Dr vasu pinnti ICT
Decision tree induction algorithms 14 Many Algorithms: Hunt s Algorithm (one of the earliest) ID3(Induction Decision Tree ver. 3), C4.5, C5.0 by Ross Quinlan et.al. CART (Classification And Regression Tree) CHAID (Chi-square Automatic Interaction Detection) 3/3/2025 Dr vasu pinnti ICT
Decision tree induction algorithms 15 3/3/2025 Dr vasu pinnti ICT
Nave Bayesian Classification 16 Let D be a training set of tuples and their associated class labels, and each tuple is represented by an n-D attribute vector X = (x1, x2, , xn) Suppose there are m classes C1, C2, , Cm. Classification is to derive the maximum posteriori, i.e., the maximal P(Ci|X) This can be derived from Bayes theorem ) ( ) | ( ) | ( X P i X P C P C i i = X P C ( ) Since P(X) is constant for all classes, only X = X ( | ) ( | ) ( ) P C P C P C needs to be maximized i i i 3/3/2025 Dr vasu pinnti ICT
Naive Bayesian Classifier Example 17 play foot ball? Outlook Temperature Humidity Windy Class sunny hot high sunny hot high overcast hot high rain mild high rain cool normal rain cool normal overcast cool normal sunny mild high sunny cool normal rain mild normal sunny mild normal overcast mild high overcast hot normal rain mild high false true false false false true true false false false true true false true N N P P P N P N P P P P P N 3/3/2025 Dr vasu pinnti ICT
Naive Bayesian Classifier Example 18 Outlook Temperature Humidity Windy Class overcast hot high rain mild high rain cool normal overcast cool normal sunny cool normal rain mild normal sunny mild normal overcast mild high overcast hot normal false false false true false false true true false P P P P P P P P P 9 Outlook Temperature Humidity Windy Class sunny hot high sunny hot high rain cool normal sunny mild high rain mild high false true true false true N N N N N 5 3/3/2025 Dr vasu pinnti ICT
Naive Bayesian Classifier Example 19 Given the training set, we compute the probabilities: Outlook sunny overcast rain Tempreature hot mild cool P N Humidity high normal P N 2/9 4/9 3/9 3/5 0 2/5 3/9 6/9 4/5 1/5 W indy true false 2/9 4/9 3/9 2/5 2/5 1/5 3/9 6/9 3/5 2/5 We also have the probabilities P = 9/14 N = 5/14 3/3/2025 Dr vasu pinnti ICT
Naive Bayesian Classifier Example 20 To classify a new sample X: outlook = sunny temperature = cool humidity = high windy = false Prob(P|X) = Prob(P)*Prob(sunny|P)*Prob(cool|P)* Prob(high|P)*Prob(false|P) = 9/14*2/9*3/9*3/9*6/9 = 0.01 Prob(N|X) = Prob(N)*Prob(sunny|N)*Prob(cool|N)* Prob(high|N)*Prob(false|N) = 5/14*3/5*1/5*4/5*2/5 = 0.013 Therefore X takes class label N 3/3/2025 Dr vasu pinnti ICT
Naive Bayesian Classifier Example 21 Second example X = <rain, hot, high, false> P(X|p) P(p) = P(rain|p) P(hot|p) P(high|p) P(false|p) P(p) = 3/9 2/9 3/9 6/9 9/14 = 0.010582 P(X|n) P(n) = P(rain|n) P(hot|n) P(high|n) P(false|n) P(n) = 2/5 2/5 4/5 2/5 5/14 = 0.018286 Sample X is classified in class N (don t play) 3/3/2025 Dr vasu pinnti ICT
Clustering algorithms-Partitioning method 22 Partitioning method: Construct a partition of a database D of n objects into a set of k clusters Given a k, find a partition of k clusters that optimizes the chosen partitioning criterion Heuristic methods: k-means and k-medoids algorithms k-means(MacQueen 67): Each cluster is represented by the center of the cluster k-medoids or PAM (Partition around medoids) (Kaufman & Rousseeuw 87): Each cluster is represented by one of the objects in the cluster 3/3/2025 Dr vasu pinnti ICT
The K-Means Clustering Method 23 Given k, the k-means algorithm is implemented in 4 steps: Partition objects into k nonempty subsets Compute seed points as the centroids of the clusters of the current partition. The centroid is the center (mean point) of the cluster. Assign each object to the cluster with the nearest seed point. Go back to Step 2, stop when no more new assignment. 3/3/2025 Dr vasu pinnti ICT
The K-Means Clustering Method 24 Example 10 10 9 9 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 10 10 9 9 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 3/3/2025 Dr vasu pinnti ICT
APACHE MAHOUT 25 3/3/2025 Dr vasu pinnti ICT
SPARK MLib 26 3/3/2025 Dr vasu pinnti ICT
Introduction to HADOOP 27 ANY QUESTIONS / DOUBTS ??? 3/3/2025 Dr vasu pinnti ICT