
Machine Learning and Data Mining Overview
Explore the world of data mining and machine learning, from extracting valuable patterns to industrial applications. Discover the basic procedures, machine learning in industry, industrial ML applications, workflows, and real-life examples like California house sales prediction.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Data mining vs Machine learning Data Mining: extracting interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data A more general area than machine learning Covers data processing, pattern extraction, and most machine learning methods (e.g., supervised learning, unsupervised learning) Data mining and machine learning play critical roles in industrial applications and scientific research
Basic procedure Model validation/ Inference/ explanation Model Learning/ Machine learning Data Preprocessing Preparing data for Mathematical modeling Check model quality; Deploy models; Visualize and explain model working mechanism and outputs Supervised and unsupervised learning; Algorithms and optimization e.g., clean and convert raw data (image/text/tabular) to vector data
Machine learning (ML) in industry About 10 years ago, ML was still mainly used by Big Tech It s common for companies using ML to drive revenues Top segments are: high-tech, automotive, manufacturing, retail, finance, healthcare
Example: California house sales prediction The goal is to predict the bid price for the winning buyer https://www.kaggle.com/c/california-house-prices/
Challenges in each stage Formulate problem: focus on the most impactful problems Data: high-quality data is scarce, privacy issues Train models: ML models are more and more complex, data-hungry, expensive Deploy models: heavy computation is not suitable for real-time inference Monitor: data distributions shifts, privacy/fairness issues
Course topics Techniques a data scientist needs, often not covered by other courses Supervised learning DATA Collect/preprocess data Labeling data Data cleaning Data transformation Feature engineering Feature selection Dim. reduction Types of data: image, text, graph Modeling methods Model validation Model combination Unsupervised Semi-supervised Clustering Association rule Graph link analysis * Deployment and monitoring Are not covered by this course
Roles in DM/ML Domain experts: have business insights, know what data is important and where to find it, identify the real impact of a ML model Data scientists: full stack on data mining, model training and deployment ML experts: customize state-of-the-art ML models Software development engineers: develop/maintain data pipelines, model training and serving pipelines
Summary ML has become a staple of modern business. A ML workflow includes: formulating the problem, preparing data, training models, deploying models, monitoring Data mining are about the technologies supporting this workflow, which are the focus of this course