Approaching ML Problems: Important Steps and Defining Success

approaching an ml problem n.w
1 / 13
Embed
Share

Learn about the essential steps in understanding and approaching machine learning problems, including defining success criteria and collecting and labeling data. Explore topics such as understanding the environment, defining success metrics, getting relevant data, and preparing for evaluation in the field of machine learning.

  • Machine Learning
  • Data Collection
  • Success Metrics
  • ML Problems
  • Evaluation

Uploaded on | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Approaching an ML Problem Geoff Hulten

  2. Blink Detection Iris Login Self-driving Cars Gaze tracking Use in games

  3. Important Steps Understanding the Environment Simple Heuristics Define Success Machine Learning Get Data Understanding Tradeoffs Get Ready to Evaluate Access and Iterate

  4. Understanding Environment Where will input come from? Sensor? Standard? Where will it be used? How will it be used? Open vs Closed? % open? Form of the input? Image? Video? Resources? Preprocessing? Image processing? Localization?

  5. Defining Success Good Goals Are meaningful to all participants Types of Goals Model Properties FPR/FNR P/R User Outcomes (iris login) Time to log in / battery per login (self driving) number of accidents / interaction (gaze tracking) time to select / click Are achievable Leading Indicators Engagement / Sentiment Are measurable Organization Objectives Revenue

  6. Get Data Bootstrap Web Data Usage Tie to performance Implicit success and failure Data Collection Data Collection Experience Buy a Curated Set

  7. Collecting and Labeling Data Data Collection Environment Lab? Mobile? Usage environment? Data Labeling Part of collection process Pay people Tools Subjects Coverage Managing thousands of lables/workflow Dealing with noise and mistakes Script Staged Performance task

  8. Get Ready to Evaluate Offline Set aside independent data Online Telemetry Have good coverage of users Intelligence Management Silent/rollouts/Flights Evaluation Framework

  9. Understand the Problem / Simple Heuristics Find systematic noise, do some cleaning Hand craft a heuristic model A simple decision tree A few rules Find the challenges Create a baseline

  10. Machine Learning (pt 1) Get a data set, contexts and labels Use cross validation to do some parameter tuning Load the data and labels Look at the predictions in detail: Accuracy, FP rate, FN rate, precision, recall, confusion ROC curves of a couple parameter settings Properties of the model search (training set loss vs iterations) Learning curves of train set accuracy vs validation set accuracy Some of the worst mistakes it is making Set some aside to evaluate final accuracy (final test set) Implement some basic features Easy standard stuff for the domain Do some runs to estimate the time it will take to train lots of models, use this to decide how much initial search to do

  11. Machine Learning (pt 2) Make improvements More complex standard features Custom heuristic features Try to find more data Clean/remove bad data (obvious noise) Focus in on most useful areas of parameter space Try other learning algorithms that might be better matches Now build a single model with best parameter settings and all the training data (no cross validation). Evaluate the model on the final test set If the accuracy is about what you expect Iterate for as long as you want / need to Start working to deploy everything

  12. Understanding the Tradeoffs Accuracy vs CPU cycles Planning for Context & Features Accuracy vs RAM Latency in execution How fast is the problem changing? Worst (most costly) mistakes

  13. Maturity in Machine Learning You did it once Someone else could do it You could do it again The computer does it You could do it again easily The computer does and deploys it

More Related Content