Decision Trees and Naive Bayes in Classification Methods

data analytics unit iv classification n.w

1 / 35

Embed Share

Learn about the fundamentals of classification methods such as Decision Trees and Naive Bayes in data analytics. Explore how decision tree algorithms work, the concept of node depth, and the application of these methods in predicting customer behavior.

hlay Follow

Uploaded on Mar 19, 2025 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Data Analytics UNIT-IV :Classification

Chapter Sections Decision trees- Overview, general algorithm, decision tree algorithm, evaluating a decision tree. Na ve Bayes Bayes Algorithm, Na ve Bayes Classifier, smoothing, diagnostics. Diagnostics of classifiers, Additional classification methods.

Classification Classification is widely used for prediction Most classification methods are supervised This chapter focuses on two fundamental classification methods Decision trees Na ve Bayes

Decision Trees Tree structure specifies sequence of decisions Given input X={x1, x2, , xn}, predict output Y Input attributes/features can be categorical or continuous Node = tests a particular input variable Root node, internal nodes, leaf nodes return class labels Depth of node = minimum steps to reach node Branch (connects two nodes) = specifies decision Two varieties of decision trees Classification trees: categorical output, often binary Regression trees: numeric output

Decision Trees Overview of a Decision Tree Example of a decision tree Predicts whether customers will buy a product

Decision Trees Overview of a Decision Tree Example: will bank client subscribe to term deposit?

Decision Trees The General Algorithm Construct a tree T from training set S Requires a measure of attribute information Simplistic method (data from previous Fig.) Purity = probability of corresponding class E.g., P(no)=1789/2000=89.45%, P(yes)=10.55% Entropy methods Entropy measures the impurity of an attribute Information gain measures purity of an attribute

Decision Trees The General Algorithm Entropy methods of attribute information Hx= the entropy of X Information gain of an attribute = base entropy conditional entropy

Decision Trees The General Algorithm Construct a tree T from training set S Choose root node = most informative attribute A Partition S according to A s values Construct subtrees T1, T2 for the subsets of S recursively until one of following occurs All leaf nodes satisfy minimum purity threshold Tree cannot be further split with min purity threshold Other stopping criterion satisfied e.g., max depth

Decision Trees Decision Tree Algorithms T=training set, P=output variable, A=attribute ID3 Algorithm

Decision Trees Decision Tree Algorithms C4.5 Algorithm Handles missing data Handles both categorical and sontinuous variables Uses bottom-up pruning to address overfitting CART (Classification And Regression Trees) Also handles continuous variables Uses Gini diversity index as info measure

Decision Trees Evaluating a Decision Tree Decision trees are greedy algorithms Best option at each step, maybe not best overall Addressed by ensemble methods: random forest Model might overfit the data Blue = training set Red = test set Overcome overfitting: Stop growing tree early Grow full tree, then prune

Decision Trees Evaluating a Decision Tree Decision trees -> rectangular decision regions

Decision Trees Evaluating a Decision Tree Advantages of decision trees Computationally inexpensive Outputs are easy to interpret sequence of tests Show importance of each input variable Decision trees handle Both numerical and categorical attributes Categorical attributes with many distinct values Variables with nonlinear effect on outcome Variable interactions

Decision Trees Evaluating a Decision Tree Disadvantages of decision trees Sensitive to small variations in the training data Overfitting can occur because each split reduces training data for subsequent splits Poor if dataset contains many irrelevant variables

Chapter Sections Decision trees- Overview, general algorithm, decision tree algorithm, evaluating a decision tree. Na ve Bayes Bayes Algorithm, Na ve Bayes Classifier, smoothing, diagnostics. Diagnostics of classifiers, Additional classification methods.

Nave Bayes The na ve Bayes classifier Based on Bayes theorem (or Bayes Law) Assumes the features contribute independently Features (variables) are generally categorical Discretization of continuous variables is the process of converting continuous variables into categorical ones Output is usually class label plus probability score Log probability often used instead of probability

Nave Bayes Bayes Theorem Bayes Theorem where C = class, A = observed attributes Typical medical example Used because doctor s frequently get this wrong

Nave Bayes Na ve Bayes Classifier Conditional independence assumption And dropping common denominator, we get Find cj that maximizes P(cj|A)

Nave Bayes Na ve Bayes Classifier Example: client subscribes to term deposit? The following record is from a bank client. Is this client likely to subscribe to the term deposit?

Nave Bayes Na ve Bayes Classifier Compute probabilities for this record

Nave Bayes Na ve Bayes Classifier Compute Na ve Bayes classifier outputs: yes/no The client is assigned the label subscribed = yes The scores are small, but the ratio is what counts Using logarithms helps avoid numerical underflow

Nave Bayes Smoothing A smoothing technique assigns a small nonzero probability to rare events that are missing in the training data E.g., Laplace smoothing assumes every output occurs once more than occurs in the dataset Smoothing is essential without it, a zero conditional probability results in P(cj|A)=0

Nave Bayes Diagnostics Na ve Bayes advantages Handles missing values Robust to irrelevant variables Simple to implement Computationally efficient Handles high-dimensional data efficiently Often competitive with other learning algorithms Reasonably resistant to overfitting Na ve Bayes disadvantages Assumes variables are conditionally independent Therefore, sensitive to double counting correlated variables In its simplest form, used only for categorical variables

Nave Bayes Na ve Bayes in R This section explores two methods of using the na ve Bayes Classifier Manually compute probabilities from scratch Tedious with many R calculations Use na ve Bayes function from e1071 package Much easier starts on page 222 Example: subscribing to term deposit

Chapter Sections Decision trees- Overview, general algorithm, decision tree algorithm, evaluating a decision tree. Na ve Bayes Bayes Algorithm, Na ve Bayes Classifier, smoothing, diagnostics. Diagnostics of classifiers, Additional classification methods.

Diagnostics of Classifiers The book covered three classifiers Logistic regression, decision trees, na ve Bayes Tools to evaluate classifier performance Confusion matrix

Diagnostics of Classifiers Bank marketing example Training set of 2000 records Test set of 100 records, evaluated below

Diagnostics of Classifiers Evaluation metrics

Diagnostics of Classifiers Evaluation metrics on bank marketing 100 test set poor poor

Diagnostics of Classifiers ROC curve: good for evaluating binary detection Bank marketing: 2000 training set + 100 test set > banktrain<-read.table("bank- sample.csv",header=TRUE,sep=",") > drops<- c("balance","day","campaign","pdays","previous","month") > banktrain<-banktrain[,!(names(banktrain) %in% drops)] > banktest<-read.table("bank-sample- test.csv",header=TRUE,sep=",") > banktest<-banktest[,!(names(banktest) %in% drops)] > nb_model<-naiveBayes(subscribed~.,data=banktrain) > nb_prediction<-predict(nb_model,banktest[,- ncol(banktest)],type='raw') > score<-nb_prediction[,c("yes")] > actual_class<-banktest$subscribed=='yes' > pred<-prediction(score,actual_class) # code problem

Diagnostics of Classifiers ROC curve: good for evaluating binary detection Bank marketing: 2000 training set + 100 test set

Chapter Sections Decision trees- Overview, general algorithm, decision tree algorithm, evaluating a decision tree. Na ve Bayes Bayes Algorithm, Na ve Bayes Classifier, smoothing, diagnostics. Diagnostics of classifiers, Additional classification methods.

Additional Classification Methods Ensemble methods that use multiple models Bagging: bootstrap method that uses repeated sampling with replacement Boosting: similar to bagging but iterative procedure Random forest: uses ensemble of decision trees These models usually have better performance than a single decision tree Support Vector Machine (SVM) Linear model using small number of support vectors