
Decision Trees and Ensemble Learning
Explore the concepts of decision trees and ensemble learning, including impurity importance, classification techniques, clustering fundamentals, evaluation methods, and meta classifiers. Discover how decision trees can efficiently handle unnormalized datasets and make binary decisions based on features and thresholds.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Decision Trees and Ensemble Learning Prof V B More MET s IOE BKC Nashik
Decision Trees Decision Trees Unit 5 Decision Trees and Ensemble Learning Decision Trees- Impurity Importance. Decision Tree Classification with Scikit- learn, Ensemble Learning-Random AdaBoost, Gradient Tree Boosting, Voting Classifier. Clustering Fundamentals- Basics, K-means: Finding optimal number of clusters, DBSCAN, Spectral Clustering. Evaluation methods based on Ground Truth- Homogeneity, Completeness, Adjusted Rand Index. Introduction to Meta Classifier: Concepts of Weak and eager learner, Ensemble methods, Bagging, Boosting, Random Forests. measures, Feature Forest, Prof V B More, MET BKC IOE Nashik 2
Decision Trees Decision Trees Decision Trees Decision Trees Binary decision trees: A binary decision tree is a structure based on a sequential decision process. Starting from the root, a feature is evaluated and one of the two branches is selected. This procedure is repeated till we reach the final leaf which is a target. Prof V B More, MET BKC IOE Nashik 3
Decision Trees Decision Trees Decision Trees Decision trees can work efficiently with unnormalized datasets because their internal structure is not influenced by the values assumed by each feature. Prof V B More, MET BKC IOE Nashik 4
Decision Trees Decision Trees Decision Trees Plots of an unnormalized bidimensional dataset and the cross-validation scores obtained using a logistic regression and a decision tree. The decision tree always achieves a score close to 1.0, while the logistic regression has an average slightly greater than 0.6 Prof V B More, MET BKC IOE Nashik 5
Decision Trees Decision Trees Decision Trees Binary decisions: Consider an input dataset X: Every vector is made up of m features. Each of them can be a good candidate to create a node based on the (feature, threshold) tuple: Prof V B More, MET BKC IOE Nashik 6
Decision Trees Decision Trees Decision Trees Binary decisions: Prof V B More, MET BKC IOE Nashik 7
Decision Trees Decision Trees Decision Trees According to the feature and threshold, the structure of the tree will change. We should pick the feature that best separates our data. It is necessary to find the feature that minimizes the number of decision steps (i.e. number of branches). Prof V B More, MET BKC IOE Nashik 8
Decision Trees Decision Trees Decision Trees Consider a class of students where all males have dark hair and females have pale hair all Prof V B More, MET BKC IOE Nashik 9
Decision Trees Decision Trees Decision Trees The block Dark color? will contain both males and females (which are the targets that we want to classify). This concept is expressed using the term purity (or, its opposite concept, impurity). Ideally, impurity is null, therefore further decisions will be taken only on the remaining features. Prof V B More, MET BKC IOE Nashik 10
Decision Trees Decision Trees Decision Trees impurity can not be null because there are both male and female students with long hair. The choice of the best threshold is a fundamental element because determines the structure of the tree performance. it and its Selection tuple with index and threshold Prof V B More, MET BKC IOE Nashik 11
Decision Trees Decision Trees Decision Trees We can define a total impurity measure by considering two branches: Here, D is the whole dataset at the selected node, Dleft and Dright are the resulting subsets (by applying the selection tuple), and the I s are impurity measures. Prof V B More, MET BKC IOE Nashik 12
Decision Trees Decision Trees Decision Trees Impurity measures To define the most used impurity measures, we need to consider the total number of target classes: Prof V B More, MET BKC IOE Nashik 13
Decision Trees Decision Trees Decision Trees Impurity measures In a certain node j, we can define the probability p(i|j) where i is an index [1, n] associated with each class. This value is the ratio between the number of samples belonging to class i and the total number of samples belonging to the selected node. Prof V B More, MET BKC IOE Nashik 14
Decision Trees Decision Trees Decision Trees Gini impurity index The Gini impurity index is defined as: Here, the sum is always extended to all classes. This is a very common measure and it is used as a default value by scikit-learn. Prof V B More, MET BKC IOE Nashik 15
Decision Trees Decision Trees Decision Trees Gini impurity index Given a sample, the Gini impurity measures the probability of a misclassification if a label is randomly chosen using the probability distribution of the branch. The index reaches its minimum (0.0) when all the samples of a node are classified into a single category. Prof V B More, MET BKC IOE Nashik 16
Decision Trees Decision Trees Decision Trees Cross-entropy impurity index The cross-entropy measure is defined as: This measure is based on information theory, and assumes null values only when samples belonging to a single class which are available in a different branches. Prof V B More, MET BKC IOE Nashik 17
Decision Trees Decision Trees Decision Trees Cross-entropy impurity index It is maximum when there is a uniform distribution among classes (which is one of the worst cases in decision trees because it means that there are still many decision steps until the final classification). Prof V B More, MET BKC IOE Nashik 18
Decision Trees Decision Trees Decision Trees Cross-entropy impurity index This index is very similar to the Gini impurity, even though the cross-entropy allows to select the split of a tree that minimizes the uncertainty about the classification. The Gini impurity minimizes the probability of misclassification. Prof V B More, MET BKC IOE Nashik 19
Decision Trees Decision Trees Decision Trees Cross-entropy impurity index Define the information gain provided by a split: Where, Here H(X) is the Entropy. Prof V B More, MET BKC IOE Nashik 20
Decision Trees Decision Trees Decision Trees Cross-entropy impurity index ID3 Decision Tree Use of Entropy and Gain Prof V B More, MET BKC IOE Nashik 21
Decision Trees Decision Trees Decision Trees Cross-entropy impurity index Start from highest information gain providing a node to grow the tree and proceed until one of the following conditions is verified: All nodes are pure The information gain is null The maximum depth has been reached Prof V B More, MET BKC IOE Nashik 22
Decision Trees Decision Trees Decision Trees Misclassification impurity index The misclassification impurity is the simplest index, defined as: In terms of quality performance, this index is not the best choice because it is not particularly sensitive to different probability distributions. Prof V B More, MET BKC IOE Nashik 23
Decision Trees Decision Trees Decision Trees Feature importance When decision tree is growing with a multidimensional dataset, it can be useful to evaluate the importance of each feature in predicting the output values. Decision trees offer a different approach based on the impurity reduction determined by every single feature. Prof V B More, MET BKC IOE Nashik 24
Decision Trees Decision Trees Decision Trees Feature importance In particular, considering a feature xi, its importance can be determined as: Nk is the number of samples reaching the node k. Feature the importance is a weighted sum of all impurity reductions. Prof V B More, MET BKC IOE Nashik 25
Decision Trees Decision Trees Decision Trees Feature importance It is computed considering only the nodes where the feature is used to split them. If Gini impurity index is adopted, then this measure is called Gini importance. Prof V B More, MET BKC IOE Nashik 26
Decision Trees Decision Trees Decision Trees Decision tree classification with scikit-learn scikit-learn contains the D ecisionTreeC lassifier class, which can train a binary decision tree with Gini and cross-entropy measures. impurity Prof V B More, MET BKC IOE Nashik 27
Decision Trees Decision Trees Decision Trees Decision tree classification with scikit-learn Consider a dataset with three features and three classes: from sklearn.datasets im port m ake_classification >>> nb_sam ples = 500 >>> X, Y = m ake_classification(n_sam ples=nb_sam ples, n_features=3, n_inform ative=3, n_classes=3, n_clusters_per_class=1) n_redundant=0, Prof V B More, MET BKC IOE Nashik 28
Decision Trees Decision Trees Decision Trees Decision tree classification with scikit-learn First consider a classification with default Gini impurity: from sklearn.tree im port D ecisionTreeC lassifier from sklearn.m odel_selection im port cross_val_score >>> dt = D ecisionTreeC lassifier() >>> print(cross_val_score(dt, X, Y , scoring='accuracy', cv=10).m ean()) 0.970 Prof V B More, MET BKC IOE Nashik 29
Decision Trees Decision Trees Decision Trees Decision tree classification with scikit-learn To export a trained tree, it is necessary to use the built-in function export_graphviz(): from sklearn.tree im port export_graphviz >>> dt.fit(X, Y) >>> w ith open('dt.dot', 'w ') as df: df = export_graphviz(dt, out_file=df, feature_nam es=['A ','B ','C '] , class_nam es=['C1', 'C 2', 'C 3']) Prof V B More, MET BKC IOE Nashik 30
Decision Trees Decision Trees Decision Trees Decision tree classification with scikit-learn In this case, we have used A, B, and C as feature names and C1, C2, and C3 as class names. Once the file has been created, it's possible converting to PDF using the command-line tool: >>> <G raphviz H om e>bindot -Tpdf dt.dot -o dt.pdf Prof V B More, MET BKC IOE Nashik 31
Decision Trees Decision Trees Decision Trees Decision Tree Graph: only a part of a branch: Prof V B More, MET BKC IOE Nashik 32
Decision Trees Decision Trees Decision Trees Decision tree classification with scikit-learn There are two kinds of nodes: 1) Nonterminal, which contains the splitting tuple (as feature <= threshold) and a positive impurity measure 2) Terminal, where the impurity measure is null and a final target class is present Prof V B More, MET BKC IOE Nashik 33
Decision Trees Decision Trees Decision Trees Decision tree classification with scikit-learn We can find features that separate a lot of samples; therefore, their importance must be higher than that of all terminal nodes. In scikit-learn, it is possible to assess the Gini importance of each feature after training a model: Prof V B More, MET BKC IOE Nashik 34
Decision Trees Decision Trees Decision Trees Decision tree classification with scikit-learn >>> dt.feature_im portances_ array([ 0.12066952, 0.12532507, 0.0577379 , 0.14402762, 0.14382398,0.12418921, 0.14638565, 0.13784106]) >>> np.argsort(dt.feature_im portances_) array([2, 0, 5, 1, 7, 4, 3, 6], dtype=int64) Prof V B More, MET BKC IOE Nashik 35
Decision Trees Decision Trees Decision Trees Decision tree classification with scikit-learn Prof V B More, MET BKC IOE Nashik 36
Decision Trees Decision Trees Decision Trees Decision tree classification with scikit-learn The most important features are 6, 3, 4, and 7, while feature 2, for example, separates a very small number of samples, and can be considered noninformative classification task. for the Prof V B More, MET BKC IOE Nashik 37
Decision Trees Decision Trees Decision Trees Decision tree classification with scikit-learn In terms of efficiency, a tree can also be pruned using the m ax_depth parameter. The parameter m ax_features can be used for this Purpose: Prof V B More, MET BKC IOE Nashik 38
Decision Trees Decision Trees Decision Trees Decision tree classification with scikit-learn If it is a number, the value is directly taken into account at each split If it is 'auto' or 'sqrt', the square root of the number of features will be adopted If it is 'log2', the logarithm (base 2) will be used If it is 'None', all the features will be used (this is the default value) Prof V B More, MET BKC IOE Nashik 39