Ensemble Learning with Decision Trees and Boosting Methods

ensemble learning n.w
1 / 17
Embed
Share

Explore the concept of ensemble learning through decision trees and boosting methods in this comprehensive guide. Learn about techniques for improving classification accuracy, such as bagged trees and boosted trees like AdaBoost and Gradient Tree Boosting.

  • Ensemble Learning
  • Decision Trees
  • Boosting Methods
  • Random Forests
  • Classification

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Ensemble Learning Prof V B More MET s IOE BKC Nashik

  2. Ensemble Learning Unit 5 Decision Trees and Ensemble Learning Decision Trees- Impurity Importance. Decision Tree Classification with Scikit- learn, Ensemble Learning-Random AdaBoost, Gradient Tree Classifier. Clustering Fundamentals- Basics, K- means: Finding optimal DBSCAN, Spectral Clustering. Evaluation methods based on Ground Completeness, Adjusted Rand Index. Introduction to Meta Classifier: Concepts of Weak and eager learner, Ensemble methods, Bagging, Boosting, Random Forests. measures, Feature Forest, Voting Boosting, number of clusters, Truth- Homogeneity, Prof V B More, MET BKC IOE Nashik 2

  3. Ensemble Learning Ensemble Learning Ensemble Learning We seen techniques that predict the class labels of unknown examples using a single classifier induced from training data. The classifier works for specific purpose. Prof V B More, MET BKC IOE Nashik 3

  4. Ensemble Learning Ensemble Learning Ensemble Learning Now we see techniques which are used for improving classification aggregating the predictions of multiple classifiers. accuracy by These techniques are known as the ensemble or classifier methods. combination Prof V B More, MET BKC IOE Nashik 4

  5. Ensemble Learning Ensemble Learning An ensemble method constructs a set of base classifiers from training data and performs classification by taking a majority vote or the averaging of results made by each base classifier. This approach is based on a set of weak learners that can be trained in parallel or sequentially. Prof V B More, MET BKC IOE Nashik 5

  6. Ensemble Learning Ensemble Learning These methods can be classified into two main categories: Bagged (or Bootstrap) trees: In this case, the ensemble is built completely. The training process is based on a random selection of the splits (partitions) and the predictions are based on a majority vote. Random forests are an example of bagged tree ensembles. Prof V B More, MET BKC IOE Nashik 6

  7. Ensemble Learning Ensemble Learning Boosted trees: The ensemble is built sequentially, focusing on the samples that have been previously misclassified. Examples of boosted trees are: AdaBoost and Gradient Tree Boosting. Prof V B More, MET BKC IOE Nashik 7

  8. Ensemble Learning Ensemble Learning Random forests A random forest is a set of decision trees built on random samples with a different policy for splitting a node: Instead of looking for the best choice, a random subset of features is used for each tree, trying to find the threshold that best separates the data. Prof V B More, MET BKC IOE Nashik 8

  9. Ensemble Learning Ensemble Learning Random forests As a result, there will be many trees trained in a weaker way and each of them will produce a different prediction. There are two ways to interpret these results; the more common approach is based on a majority vote (the most voted class will be considered correct). Prof V B More, MET BKC IOE Nashik 9

  10. Ensemble Learning Ensemble Learning Random forests However, in second approach, averaging the results, which yields predictions (scikit-learn s default approach). very accurate Even if both are theoretically different, the probabilistic average of a trained random forest cannot be very different from the majority of predictions, therefore the two methods often yields similar results. Prof V B More, MET BKC IOE Nashik 10

  11. Ensemble Learning Ensemble Learning E.g. Consider the MNIST dataset with random forests made of a different number of trees: from sklearn.ensem ble im port R andom ForestC lassifier >>> nb_classifications = 100 >>> accuracy = [] >>> for i in range(1, nb_classifications): a = cross_val_score (R andom ForestC lassifier (n_estim ators=i), digits.data, scoring='accuracy', cv=10).m ean() rf_accuracy.append(a) Prof V B More, MET BKC IOE Nashik digits.target, 11

  12. Ensemble Learning Ensemble Learning Resulting plot of random forests made of a different number of trees: Prof V B More, MET BKC IOE Nashik 12

  13. Ensemble Learning Ensemble Learning Random forests The accuracy is low when the number of trees is under a minimum threshold; It starts increasing rapidly with fewer than 10 trees. A value between 20 and 30 trees yields the optimal result (95%), which is higher than for a single decision tree. Prof V B More, MET BKC IOE Nashik 13

  14. Ensemble Learning Ensemble Learning Random forests When the number of trees is low, the variance of the model is very high and the averaging process produces many incorrect results; But, increasing the number of trees reduces the variance and allows the model to converge to a very stable solution. Prof V B More, MET BKC IOE Nashik 14

  15. Ensemble Learning Ensemble Learning Random forests Scikit-learn also offers a variance that enhances the randomness in selecting the best threshold. Using the E xtraTreesC lassifier class, it's possible to implement a model that randomly computes thresholds and picks the best one. Prof V B More, MET BKC IOE Nashik 15

  16. Ensemble Learning Ensemble Learning Random forests To further reduce the variance: from sklearn.ensem ble im port E xtraTreesC lassifier >>> nb_classifications = 100 >>> for i in range(1, nb_classifications): a = cross_val_score( n_estim ators=i), digits.data, scoring='accuracy', cv=10).m ean() et_accuracy.append(a) E xtraTreesC lassifier( digits.target, Prof V B More, MET BKC IOE Nashik 16

  17. Ensemble Learning Ensemble Learning Resulting plot of extra tree random forest classifier shows improvement in accuracy: Prof V B More, MET BKC IOE Nashik 17

Related


More Related Content