Boost Machine Learning Transparency with ATMSeer Visualization Tool

atmseer increasing transparency n.w
1 / 11
Embed
Share

"Learn about how ATMSeer enhances transparency and controllability in Automated Machine Learning by providing interactive visualization at algorithm, hyperpartition, and hyperparameter levels. Explore diverse ML models efficiently with this innovative tool."

  • Machine Learning
  • Transparency
  • Visualization
  • Automated ML
  • Controllability

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. ATMSeer: Increasing Transparency and Controllability in Automated Machine Learning Qianwen Wang, Yao Ming, Zhihua Jin, Qiaomu Shen, Dongyu Liu, Micah J. Smith, Kalyan Veeramachaneni, and Huamin Qu. Presented by: Prakhar Singh

  2. Intro AutoML Automatically iterates through various machine learning algorithms and optimizes hyperparameters in a predefined search space Gained popularity and significant research attention in recent years Problem: lack of transparency Black boxes- leaves users in the dark about how models were chosen and whether the system properly explored the available options Solution: ATMSeer Interactive visualization tool Provides visual summary of models it has searched Increase transparency Allowing users to explore and analyze the search at three levels: Algorithm level (which types of models were used) Hyperpartition level (groups of similar models) Hyperparameter level (specific settings for each model) Enables users to interactively modify the search space in real time

  3. Related Work Choosing ML models While these works provides useful guidance, they fail to provide detailed instruction for a particular problem ATMSeer automatically tries different models Visualizing automated ML These works only support the analysis of one type of model at a time ATMSeer supports the analysis of machine learning models generated with various algorithms Visualizing ML models Requirements of model users are not thoughtfully considered ATMSeer allows users to easily observe and analyze models through an interactive visualization

  4. System Requirements and Design Goals and Target Users Goal: help users efficiently search, analyze, and choose machine learning models for specific tasks Target users: expertise in machine learning but have struggled with manual, error-prone model search processes Data Abstraction AutoML process framed as training a sequence of models on a dataset, where each model is a multivariate data point consisting of: Algorithm (categorical) Hyperpartition (set of categorical variables) Hyperparameters (numerical variables) Performance (numerical)

  5. System Requirements and Design Authors interviewed six participants (three domain-specific, three machine learning experts) Understand how they choose ML models + opportunities to improve the experience Discussed experiences with model development, views on AutoML, and used a pilot system for solving a classification problem Three key decisions identified from the interviews that guide the AutoML process: Search Space: How many algorithms will be searched? Computational Budget: How long will the process run? Model Choice: Which model is the best among the searched models? Design Requirements Assist in making said decisions 1. Provide an overview of the AutoML process to track performance and model search progression 2. Connect models with the search space to facilitate model selection and search space modification 3. Offer guidance for search space modifications to improve search efficiency 4. Allow seamless in-situ search space configuration for ease of use 5. Support multi-granularity analysis to help users understand and monitor the hierarchical structure of the search space

  6. ATMSeer Consists of a client-server architecture Server coordinates AutoML processes, data storage, and APIs Client provides a visual interface with controls and insights into the AutoML process Interface Design The interface has three key sections: Control Panel: For uploading datasets and managing AutoML processes. Overview Panel: Provides high-level statistics, performance summaries, and a list of top models for comparison AutoML Profiler: A three-level profiler for detailed analysis at the algorithm, hyperpartition, and hyperparameter levels Interaction Design Real-time Control Allows dynamic updates for users to monitor AutoML in real-time and perform "run-reconfigure-run" workflows In-Situ Search Space Configuration Users can modify the search space directly within the interface they use for analyzing models

  7. Case Studies Setup Involves two machine learning experts (E1 and E2) using ATMSeer to optimize model selection and performance for two datasets Models used: Support vector machine (SVM), Extra-trees (ET), Linear models with SGD training (SGD), K-nearest neighbors (KNN), Random forest (RF), Multi-layer perceptron (MLP), Gaussian process (GP) Model Selection and Analysis (E1) E1 used ATMSeer to find a model for an arsenic-female-bladder dataset, classifying patients as healthy or with bladder cancer Searched 250 models, achieving an F1-score of 0.939 Examined the top 10 models with similar scores and selected KNN based on its consistent performance Further investigated underperforming KNN models and identified that the "number of neighbors" hyperparameter affected performance the most Refining Search Space (E2) Worked with the Friedman dataset and focused on ET Modified the search to focus on ET models, improving performance from 0.887 to 0.922 Refined the "max_features" hyperparameter, finding that values between 0.7 and 1 yield the best results

  8. Expert Interview Conducted interviews with E1 and E2 to evaluate ATMSeer Summarized three main use cases Knowledge Distillation from AutoML E2 noted that being able to match prior knowledge about ML to generated visualizations creates confidence in the AutoML process and increases the likelihood of adopting AutoML Believed that ATMSeer could function as an educational tool for ML Human-Machine Interaction in AutoML Experts prefer controlling the process rather than fully relying on automated methods Diagnosis of AutoML Experts noticed that some algorithms were disproportionately searched, even when their performance was not superior to others Also noticed issues with reward function in some AutoML processes

  9. User Study Avoid unfair benchmarking; relatively new system Evaluated ATMSeer s usability and its impact on user behavior in AutoML processes by engaging with 13 participants (graduate students with ML or data science experience but no AutoML experience) Finding a high-performing model using AutoML Analyzing an AutoML process and answering questions related to dataset characteristics and search modification Participants used ATMSeer, completing the tasks in a 40-minute session with a tutorial, usability questionnaire, and post-study interview Results: Most found ATMSeer easy to learn and use. 84.6% were confident in their selected model, and 92.3% were willing to use ATMSeer again Some noted difficulties with hyperparameters and desired further validation tools For objective questions, participants performed well (99/104 correct answers), with errors attributed to low familiarity with certain ML models

  10. User Study Revisiting key decisions Search Space: 10 participants adopted a coarse-to-fine strategy, refining the search space at the algorithm and hyperparameter levels Fine modifications received less interest Computational Budget: Participants based budget increases on performance, model count, algorithm coverage, and best performance Experts tended to search more models compared to novices Model Choice: While performance drove model selection, nine participants preferred familiar models, and three felt ATMSeer improved their understanding of unfamiliar models Study demonstrated ATMSeer's usability and effectiveness in facilitating AutoML processes and learning

  11. Discussion and Future Work Key Takeaways Although initially designed for ML experts, can also benefit beginners by helping them understand how algorithm, hyperpartition, and hyperparameter choices affect model performance AutoML designers can use ATMSeer for debugging and improving AutoML algorithms Limitation: small sample size (13 participants, mainly graduate students). Broader studies needed to generalize findings, although positive feedback is encouraging ATMSeer s visualization supports distinguishing 8-15 algorithms and analyzing >1000 models Ensures scalability for most real-world tasks, typical AutoML processes involve 100-400 models Future Work Validating ATMSeer with larger, diverse user groups Improving the tool based on feedback Incorporate human-in-the-loop reinforcement learning to detect critical points in AutoML processes where human intervention is needed

Related


More Related Content