ROC Curves and Operating Points in Machine Learning

Download Presenatation
roc curves and operating points n.w
1 / 11
Embed
Share

Explore the concepts of ROC curves, changing thresholds in logistic regression, comparing models using ROC curves, precision-recall curves, AUC, and finding optimal operating points for better classification performance. Learn how to balance mistakes for your application and make informed decisions in model evaluation.

  • Machine Learning
  • ROC Curves
  • Logistic Regression
  • Classification
  • Performance Evaluation

Uploaded on | 2 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. ROC Curves and Operating Points Geoff Hulten

  2. Classifications and Probability Estimates Logistic regression produces a score (probability estimate) Use threshold to produce classification What happens if you vary the threshold?

  3. Example of Changing Thresholds Threshold = .5 False Positive Rate 33% Score Prediction Y False Negative Rate 0% .25 0 .45 0 Threshold = .6 False Positive Rate 33% .55 1 False Negative Rate 33% .67 0 Threshold = .7 .82 1 False Positive Rate 0% .95 1 False Negative Rate 33%

  4. ROC Curve

  5. Comparing Models with ROC Curves

  6. More ROC Comparisons

  7. Some Common ROC Problems

  8. Precision Recall Curves PR Curves Incremental Classifications More Accurate Everything Classified Correctly Incremental Classifications Less Accurate First Set of Mistakes Everything Classified as 1

  9. Area Under Curve -- AUC AUC ~ .97 Integrate the Area Under Curve Perfect score is 1 Higher scores allow for generally better tradeoffs AUC ~ .89 Score of 0.5 indicates random model Score of < 0.5 indicates you re doing something wrong

  10. Operating Points Balance Mistakes for your Application Spam needs low FP Rate Use separate hold out data to find threshold

  11. Pattern for using operating points # Train model and tune parameters on training and validation data # Evaluate model on extra holdout data, reserved for threshold setting xThreshold, yThreshold = ReservedData() # Find threshold that achieves operating point on this extra holdout data potentialThresholds = {} for t in range [ 1% - 100%]: potentialThresholds[t] = FindFPRate(model.Predict(xThreshold, t), yThreshold) bestThreshold = FindClosestThreshold(<target>, potentialThresholds) # Evaluate on test data with selected threshold to estimate generalization performance performanceAtOperatingPoint = FindFNRate(model.Predict(xTest, bestThreshold), yTest) # make sure nothing went crazy if FindFPRate(model.Predict(xTest, bestThreshold), yTest) <is not close to> potentialThresholds[bestThreshold]: # Problem?

More Related Content