Understanding Decision Trees for Machine Learning

machine learning for beginners class 5 n.w
1 / 18
Embed
Share

Explore the fundamentals of decision trees in machine learning, where data is categorized based on yes or no questions to make predictions effectively. Learn about the structure and working principles of decision trees and their significance in predictive modeling. Discover how decision trees can help in identifying high-risk individuals for relapse in substance abuse treatment centers. Join the Class 5 session by Kelsey Emnett to delve deeper into this essential machine learning concept.

  • Machine Learning
  • Decision Trees
  • Predictive Modeling
  • Kelsey Emnett
  • Data Analysis

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Machine Learning for Beginners: Class 5 By Kelsey Emnett

  2. Links Files located here Files located here: bit.ly/machine-learning-introduction Class videos Class videos will be posted by the Sunday after class on my website www.kelseyemnett.com and my YouTube channel https://www.youtube.com/channel/UCR7ejcDyKykQ3wE- KS08Smg.

  3. Class Machine Learning Example Data: Data: Goal: Goal: Trying to Predict: Trying to Predict: Includes every admission to a substance abuse treatment center that is publicly funded by the government Which admissions are first time admissions and which were repeat admissions To predict which individuals are at high risk for relapse upon admission to a substance abuse treatment facility Treatment Episode Data Set Available Here 3

  4. Decision Trees Overview

  5. Decision Trees Overview A Decision Tree model is a classification model It splits your data points into progressively smaller groups based on a series of yes or no questions from the model s features Questions can be based on categorical or numeric variables Questions follow a tree- like structure Image Source: Machine Learning with R, 3rd Edition by Brett Lantz

  6. Decision Trees Structure Decision trees use the following structure: The root node root node is the first node that begins the decision tree The data flows through decision nodes decision nodes where your data is narrowed down into progressively smaller, more specific groups Leaf nodes or terminal Leaf nodes or terminal nodes nodes are the final prediction based all of the yes or no questions asked in prior levels Image Source: Machine Learning with R, 3rd Edition by Brett Lantz

  7. Decision Trees Overview The model continues to split the data until one of the following conditions is met: The leaf nodes or resulting groups are homogenous/the same There are no remaining features with which to make splits A pre-defined stopping criteria set in the model s hyperparameters was reached With any tree-based model, you do not need to rescale your features or use dummy coding while pre-processing your data Dummy coding your data with these types of models may decrease model accuracy

  8. Choosing Splits: Information Gain Decision Tree models decide which split to make by finding the change in information gain from the initial split to each potential split choice Gini Impurity is the most common metric for assessing information gain Another common option is entropy Gini impurity and entropy often yield very similar results Example calculations of Gini Impurity and Entropy are on the next 3 slides

  9. Gini Impurity Ranges from 0 to 0.5 Impurity for blue balls: When trying to choose a blue ball the likelihood of choosing the wrong color is 4/10. When trying to choose a black ball the likelihood of choosing the wrong color is 6/10. For blue balls you multiply 4/10*6/10 = 24/100 Repeat this process all combinations of colors This time for black balls you will multiple 6/10*4/10 = 24/100 Add the results together, 24/100 + 24/100 = 48/100 or 0.48 is the impurity 6 Blue Balls 4 Black Balls Source: https://towardsdatascience.com/impurity-judging-splits-how-a-decision-tree-works-235f2e9e63b7

  10. Choosing Splits: Gini Impurity Example Example: Example: Predicting whether you are going to fall based on how slippery the floor is and how slippery the grip on your shoes is. Starting group: Starting group: 6 falls and 19 no falls out of 25 total points Starting impurity: Starting impurity: 19/25 chance of selecting No Fall and a 6/25 chance of misclassifying, 19/25*6/25 = 114/625 6/25 chance of selecting Fall and a 19/25 chance of misclassifying, 6/25*19/25 = 114/625 Total is 114/625 + 114/625 = 228/625 or 0.3648 Source: https://towardsdatascience.com/impurity-judging-splits-how-a-decision-tree-works-235f2e9e63b7

  11. Choosing Splits: Gini Impurity Example Group above 1.5: Group above 1.5: 1/15 chance of selecting No Fall and 14/15 of selecting the wrong label, 1/15*14/15 = 14/225 14/15 chance of selecting Fall and 1/15 of selecting the wrong label, 14/15*1/15 = 14/225 Total is 28/225 Group below 1.5: Group below 1.5: 5/10 chance of selecting No Fall and a 5/10 chance of misclassifying, 5/10*5/10 = 25/100 5/10 chance of selecting Fall and a 5/10 chance of misclassifying, 5/10*5/10 = 25/100 Total is 50/100 Source: https://towardsdatascience.com/impurity-judging-splits-how-a-decision-tree-works-235f2e9e63b7

  12. Choosing Splits: Gini Impurity Example Summed total: Summed total: The total number of data points is 25 The group above 1.5 has 15 data points so the value is weighted by 15/25 The group below 1.5 we weight the value by 10/25 28/225*15/25 + 50/100*10/25 = 0.0747 + 0.2000 = 0.2747 Reduction in impurity: Reduction in impurity: The reduction in impurity is the starting group Gini Impurity minus the weighted sum of the impurities from the resulting split groups 0.3648 0.2747 = 0.0901 Source: https://towardsdatascience.com/impurity-judging-splits-how-a-decision-tree-works-235f2e9e63b7

  13. Choosing Splits: Gini Impurity Example Summed total: Summed total: The total number of data points is 25 The group above 1.5 has 15 data points so the value is weighted by 15/25 The group below 1.5 we weight the value by 10/25 28/225*15/25 + 50/100*10/25 = 0.0747 + 0.2000 = 0.2747 Reduction in impurity: Reduction in impurity: The reduction in impurity is the starting group Gini Impurity minus the weighted sum of the impurities from the resulting split groups 0.3648 0.2747 = 0.0901 Source: https://towardsdatascience.com/impurity-judging-splits-how-a-decision-tree-works-235f2e9e63b7

  14. Entropy Entropy is a measure of disorder Usually ranges from 0 to 1 An entropy of 0 means all observations are of the same class The closer to 1, the higher the level of disorder To calculate entropy you would use the following calculation: 6 Blue Balls 4/10*log2(4/10) 6/10*log2(6/10) 0.4* 1.322 0.6* 0.737 0.5288 + 0.4422 = 0.971 4 Black Balls Entropy equals 0.971 Entropy equals 0.971 Source: https://towardsdatascience.com/impurity-judging-splits-how-a-decision-tree-works-235f2e9e63b7

  15. Regression Trees Decision trees as used for classification models Regression trees are built in a similar way to decision trees except splits are evaluated using change in: Variance Standard deviation Absolute deviation from the mean The predicted value will be the mean of training data samples placed in each leaf Source: https://towardsdatascience.com/impurity-judging-splits-how-a-decision-tree-works-235f2e9e63b7

  16. Preventing Overfitting The biggest weakness of decision trees is its tendency to overfit There are two possibilities to reduce the likelihood of overfitting and improve your decision tree models Pruning: Pruning: Reducing the size of your decision tree after its creation. The nodes and branches with little effect on the classification are removed. Early Early- -Stopping: Stopping: Stop the training of your decision trees once it has met criteria set before training begins Pruning tends to be more effective because it is difficult to judge the optimal size of a decision tree before training

  17. Greedy Learner Decision trees are greedy learners Makes each split one-at-a-time based on which step increases information gain for the next step only Can miss more nuanced criteria that may, as a whole, do a better job predicting a value In other words a decision tree is permanently limited by its past decisions

  18. Decision Tree Hyperparameters Hyperparameters Hyperparameters Max Depth What It Does What It Does The maximum depth (i.e. number of levels) of a tree based on The minimum number of observations for a split on a decision node to occur The minimum number of observations in a final leaf node Fraction of total samples required to be in a leaf node. If a sample weight is provided, it will give values in the numerator higher weight if sample weight is higher. Sets the maximum number of features to consider when looking for the best split Sets a maximum number of leaf nodes in final model Minimum amount that the impurity can decrease to make a split How It Helps How It Helps Decreasing this number reduces overfitting Increasing this number reduces overfitting Increasing this number reduces overfitting A higher value reduces overfitting. Sample weights also help when there is class imbalance by giving less weight to the dominant class. Decreasing this number will reduce overfitting Decreasing this number will reduce overfitting Increasing this number will reduce overfitting Minimum Samples Per Split Minimum Samples Per Leaf Minimum Weighted Fraction Per Leaf Maximum Features Maximum Number of Leaf Nodes Minimum Impurity Decrease Source: https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html

More Related Content