Using Classification Trees for News Popularity Analysis

using classification trees to decide news n.w
1 / 11
Embed
Share

Discover how decision trees and random forests are utilized to determine news popularity by comparing algorithm performance on the same dataset. Explore the method of iteratively splitting variables to evaluate homogeneity and key factors influencing online news popularity. Consider the strengths and limitations of decision trees and random forests, and explore their accuracy, sensitivity, and specificity in this context. Find out why resorting to other machine learning algorithms may be necessary to improve performance on this dataset.

  • News
  • Popularity
  • Classification Trees
  • Random Forests
  • Machine Learning

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Using Classification Trees to Decide News Popularity XINYUE LIU

  2. The News Popularity Data Set Goal: compare the performance of the two algorithms on the same dataset. o Source and Origin o Goal o Instances and Attributes o Examples

  3. Data Description Significant Variables Selection

  4. Decision Tree Iteratively split variables into groups Evaluate homogeneity within each group Split again if necessary Weekends Weekdays

  5. Decision Tree Pros: - Easy to Interpret - Better performance in nonlinear settings Cons: - Without pruning can lead to overfitting - Uncertainty

  6. Random Forest Bootstrap: random sampling with replacement Bootstrap samples and variables Grow multiple trees and average

  7. Random Forest Pros: Pros: - Accuracy Cons: - Speed - Hard to interpret - Overfitting

  8. Solution Random Forest Decision Tree Accuracy: 0.5056 Sensitivity : 0.4552 Specificity : 0.5273 Accuracy: 0.5068 Sensitivity : 0.4628 Specificity : 0.5339

  9. Conclusion Both algorithms do not perform well on this dataset. The day the news published, the topic, and the number of images decide the popularity of the online news. Resort to other machine learning algorithms. Weekends Weekdays

  10. References Trevor Hastie, Rob Tibshirani. Statistical Learning. Statistical Learning. Stanford University Online CourseWare, 21 August 2015. Lecture. http://online.stanford.edu/course/statistical-learning-winter-2014 Andrew Liu. Practical Machine Learning. Practical Machine Learning. Coursera, 21 August 2015. Lecture.

  11. Thank you!

Related


More Related Content