Large-Scale Multi-Label Classification Challenges

large scale multi label classification n.w
1 / 25
Embed
Share

Explore the challenges and solutions in large-scale multi-label classification via MetaLabeler algorithm. Discover how One-vs-Rest SVM approach simplifies the process, making it fast and scalable. Learn about training meta models for various industries like clothing and fashion to handle complex predictions effectively.

  • Multi-Label Classification
  • SVM
  • MetaLabeler
  • Machine Learning
  • Data Mining

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Large Scale Multi-Label Classification via MetaLabeler Lei Tang Arizona State University Suju Rajan and Vijay K. Narayanan Yahoo! Data Mining & Research

  2. Large Scale Multi-Label Classification Huge number of instances and categories Common for online contents Query Categorization Web Page Classification Social Bookmark/Tag Recommendation Video Annotation/Organization Yahoo! Data Mining & Research

  3. Challenges Multi-Class: thousands of categories Multi-Label: each instance has >1 labels Large Scale: huge number of instances and categories Our query categorization problem: 1.5M queries, 7K categories Yahoo! Directory 792K docs, 246K categories in Liu et al. 05 Most existing multi-label methods do not scale structural SVM, mixture model, collective inference, maximum- entropy model, etc. The simplest One-vs-Rest SVM is still widely used Yahoo! Data Mining & Research

  4. One-vs-Rest SVM x1 x2 x3 x4 C1, C3 C1, C2, C4 C2 C2, C4 x1 + x2 + x3 - x4 - x1 - x2 + x3 + x4 + x1 + x2 - x3 - x4 - x1 - x2 + x3 - x4 + C1 C2 C3 C4 SVM1 SVM2 SVM3 SVM4 Predict C3 C4 C1 C2 Yahoo! Data Mining & Research

  5. One-vs-Rest SVM Pros: Simple, Fast, Scalable Each label trained independently, easy to parallel Cons: Highly skewed class distribution (few +, many -) Biased prediction scores Output reasonable good ranking (Rifkin and Klauta 04) e.g. 4 categories C1, C2, C3, C4 True Labels for x1: C1, C3 Prediction Scores: {s1, s3} > {s2, s4} Predict the number of labels? Yahoo! Data Mining & Research

  6. MetaLabeler Algorithm 1. Obtain a ranking of class membership for each instance Any genetic ranking algorithm can be applied Use One-vs-Rest SVM 2. Build a Meta Model to predict the number of top classes Construct Meta Label Construct Meta Feature Build Meta Model Yahoo! Data Mining & Research

  7. Meta Model Training Clothing Leather clothing Q1 = affordable cocktail dress Labels: Formal wear Women Clothing Women Clothing Formal wear Children Clothing How to handle predictions like 2.5 labels? Fashion Meta data Query: #labels Q2 = cotton children jeans Labels: Children clothing Q1: 2 Q2: 1 Q3: 3 Regression Meta-Model One-vs-Rest SVM Q3 = leather fashion in 1990s Labels: Fashion Women Clothing Leather Clothing Yahoo! Data Mining & Research

  8. Meta Feature Construction Content-Based Use raw data Raw data contains all the info Score-Based Use prediction scores Bias with scores might be learned Rank-Based Use sorted prediction scores C1 0.9 -0.2 0.7 C2 C3 C4 - 0.6 Meta Feature C1 0.9 C2 -0.2 C3 0.7 C4 -0.6 Meta Feature 0.7 0.9 -0.2 -0.6 Yahoo! Data Mining & Research

  9. MetaLabeler Prediction Given one instance: Obtain the rankings for all labels; Use the meta model to predict the number of labels Pick the top-ranking labels MetaLabeler Easy to implement Use existing SVM package/software directly Can be combined with a hierarchical structure easily Simply build a Meta Model at each internal node Yahoo! Data Mining & Research

  10. Baseline Methods Existing thresholding methods (Yang 2001) Rank-based Cut (Rcut) output fixed number of top-ranking labels for each instance Proportion-based Cut For each label, choose a portion of test instances as positive Not applicable for online prediction Score-based Cut (Scut, aka. threshold tuning) For each label, determine a threshold based on cross-validation Tends to overfit and is not very stable MetaLabeler: A local RCut method Customize the number of labels for each instance Yahoo! Data Mining & Research

  11. Publicly Available Benchmark Data Yahoo! Web Page Classification 11 data sets: each constructed from a top-level category 2nd level topics are the categories 16-32k instances, 6-15k features, 14-23 categories 1.2 -1.6 labels per instance, maximum 17 labels Each label has at least 100 instances RCV1: A large scale text corpus 101 categories, 3.2 labels per instance For evaluation purpose, use 3000 for training, 3000 for testing Highly skewed distribution (some labels have only 3-4 instances) Yahoo! Data Mining & Research

  12. MetaLabeler of Different Meta Features Which type of meta feature is more predictive? Yahoo! RCV1 65 80 60 70 55 60 50 50 45 40 40 30 35 30 20 Exact Match Ratio Micro-F1 Macro-F1 Exact Match Ratio Micro-F1 Macro-F1 content score rank content score rank Content-based MetaLabeler outperforms other meta features Yahoo! Data Mining & Research

  13. Performance Comparison RCV1 Yahoo! 80 65 60 70 55 60 50 50 45 40 40 30 35 20 30 Exact Match Ratio Micro-F1 Macro-F1 Exact Match Ratio Micro-F1 Macro-F1 SVM RCut SCut MetaLabeler SVM RCut SCut MetaLabeler MetaLabeler tends to outperform other methods Yahoo! Data Mining & Research

  14. Bias with MetaLabeler The distribution of number of labels is imbalanced Most instances have small number of labels; Small portion of data instances have many more labels Imbalanced Distribution leads to bias in MetaLabeler Prefer to predict lesser labels Only predict many labels with strong confidence Label Distribution on Yahoo! Society Data 12000 Ground Truth 10000 Frequency 8000 MetaLabeler Prediction 6000 4000 2000 0 1 2 3 4 5 6 7 Number of Labels Yahoo! Data Mining & Research

  15. Scalability Study Computation Time Comparison on Yahoo! Society Data 2000 Computation Time (seconds) 1800 SVM MetaLabeler Threshold Tuning 1600 1400 1200 1000 800 600 400 200 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Number of Samples for Training Threshold tuning requires cross-validation, otherwise overfit MetaLabeler simply adds some meta labels and learn One-vs- Rest SVMs Yahoo! Data Mining & Research

  16. Scalability Study (cond.) Threshold tuning: linearly increasing with number of categories in the data E.g. 6000 categories -> 6000 thresholds to be tuned MetaLabeler: upper bounded by the maximum number of labels with one instance E.g. 6000 categories but one instance has at most 15 labels Just need to learn additional 15 binary SVMs Meta Model is independent of number of categories Yahoo! Data Mining & Research

  17. Application to Large Scale Query Categorization Query categorization problem: 1.5 million unique queries: 1M for training, 0.5M for testing 120k features A 8-level taxonomy of 6433 categories Multiple labels e.g. 0% interest credit card no transfer fee Financial Services/Credit, Loans and Debt/Credit/Credit Card/ Balance Transfer Financial Services/Credit, Loans and Debt/Credit/Credit Card/ Low Interest Card Financial Services/Credit, Loans and Debt/Credit/Credit Card/ Low-No-fee Card 3+ labels 3% 2 labels 16% 1.23 labels on average At most 26 labels 1 label 81% Yahoo! Data Mining & Research

  18. Flat Model Flat Model: do not leverage the hierarchical structure Threshold tuning on training data alone takes 40 hours to finish while MetaLabeler costs 2 hours. 100 90 80 70 Micro-F1 60 SVM MetaLabeler Threshold Tuning 50 40 30 20 10 0 1 2 3 4 5 6 7 8 Depth Yahoo! Data Mining & Research

  19. Hierarchical Model - Training Root Step 1: Generate Training Data . . . . . Step 2: Roll up labels . . . . . . . . . . . . . . . . . . . . . . Other Step 3: Create Other Category N Step 4: Train One vs. Rest SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . New Training Data Training Data Yahoo! Data Mining & Research

  20. Hierarchical Model - Prediction Root Query q Predict using SVMs trained at root level m4 m1 . . . . . m2 m3 Query q c1 m2 . . . . . . . . . . . . . . . . . . Stop !!! Stop !!! Query q c2 m3 . . . . . c3 Other . . . . . . . . . . . . . . . . . . . . . . . . . . Stop if reaching a leaf node or other category Yahoo! Data Mining & Research

  21. Hierarchical Model + MetaLabeler Precision decrease by 1-2%, but recall is improved by 10% at deeper levels. 100 95 90 Performance 85 MetaLabeler-Precision MetaLabeler-Recall SVM-Precision SVM-Recall 80 75 70 65 60 1 2 3 4 5 6 7 8 Depth Yahoo! Data Mining & Research

  22. Features in MetaLabeler Feature Related Categories Overstock.com Mass Merchants/ /discount department stores Apparel & Jewelry Electronics & Appliances Home & Garden Books-Movies-Music-Tickets Blizard Toys & Hobbies/ /Video Game Computing/ /Computer Game Software Entertainment & Social Event/ /Fast Food Restaurant Reference/News/Weather Information Threading Books-Movies-Music-Tickets/ /Computing Books Computing/ /Programming Health and Beauty/ /Unwanted Hair Toys and Hobbies/ /Sewing Yahoo! Data Mining & Research

  23. Conclusions & Future Work MetaLabeler is promising for large-scale multi-label classification Core idea: learn a meta model to predict the number of labels Simple, efficient and scalable Use existing SVM software directly Easy for practical deployment Future work How to optimize MetaLabeler for desired performance ? E.g. > 95% precision Application to social networking related tasks Yahoo! Data Mining & Research

  24. Questions?

  25. References Liu, T., Yang, Y., Wan, H., Zeng, H., Chen, Z., and Ma, W. 2005. Support vector machines classification with a very large-scale taxonomy. SIGKDD Explor. Newsl. 7, 1 (Jun. 2005), 36-43. Rifkin, R. and Klautau, A. 2004. In Defense of One-Vs-All Classification. J. Mach. Learn. Res. 5 (Dec. 2004), 101-141. Yang, Y. 2001. A study of thresholding strategies for text categorization. In Proceedings of the 24th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (New Orleans, Louisiana, United States). SIGIR '01. ACM, New York, NY, 137-145. Yahoo! Data Mining & Research

Related


More Related Content