
Model Evaluation and Metrics in Machine Learning
Explore the basics of evaluating models, from training and testing data to common evaluation risks like overfitting. Learn about types of mistakes, confusion matrices, and key evaluation metrics like accuracy, precision, recall, false positive rate, and false negative rate.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
The Basics of Evaluating Models Evaluation is Creation Geoff Hulten
Evaluation is Creation Machine Learning Does this model do a good job at mapping data output ? Usually says ? = 1 Output Is one model better at it than another? ML Alg Model Data Are the mistakes similar or different? Which is better? If I ve tried 1,000 models, which should I use? Output ML Alg2 Model2 Data Usually says ? =0 We re going to spend a lot of time on evaluation, and how to interpret the results of evaluations
Training and Testing Data 1) Training set: to build the model 2) Validation set: to tune the parameters of the model 3) Test set: to estimate how well the model works Common Pattern: for p in parametersToTry: model.fit(trainX, trainY, p) accuracies[p] = evaluate(validationY, model.predict(validationX)) bestPFound = bestParametersFound(accuracies) finalModel.fit(trainX+validationX, trainY+validationY, bestPFound) estimateOfGeneralizationPerformance = evaluate(testY, finalModel.predict(testX))
Risks with Evaluation Failure to Generalize: 1) If you test on the same data you train on, you ll be too optimistic 2) If you evaluate on test data a lot as you re debugging, you ll be too optimistic Failure to learn the best model you can: 3) If you reserve too much data for testing you might not learn as good a model We ll get into more detail on how to make the tradeoff For now: if very little data (100s), maybe up to 50% for validate + test if tons of data (millions+), maybe ten thousand for validate + test for assignment 1 (1000s), we ll use 20% for validate + test
Types of Mistakes: Confusion Matrix Actual 0 0 0 0 0 1 1 1 1 1 Prediction 1 1 1 0 0 0 0 1 1 1 Binary Classification
Basic Evaluation Metrics Actual 0 0 0 0 0 1 1 1 1 1 Prediction 1 1 1 0 0 0 0 1 1 1 Accuracy: What fraction does it get right (# TP + # TN) / # Total 3 2 Precision: When it says 1 how often is it right # TP / (# TP + # FP) 3 2 Recall: What fraction of 1s does it get right # TP / (# TP + # FN) False Positive Rate: What fraction of 0s are called 1s # FP / (# FP + # TN) False Negative Rate: What fraction of 1s are called 0s # FN / (# TP + # FN)
Example of Needing the Metrics Accuracy: 91% (# TP + # TN) / # Total 90 0 False Negative Rate: 0% # FN / (# TP + # FN) 9 1 False Positive Rate: 90% # FP / (# FP + # TN)
Summary Evaluation is creation Training data, validation data, test data Learn the reasons & common pattern for using them There are many types of mistakes False positive, false negative, precision, recall, etc.