
Using Classification Trees for News Popularity Analysis
Discover how decision trees and random forests are utilized to determine news popularity by comparing algorithm performance on the same dataset. Explore the method of iteratively splitting variables to evaluate homogeneity and key factors influencing online news popularity. Consider the strengths and limitations of decision trees and random forests, and explore their accuracy, sensitivity, and specificity in this context. Find out why resorting to other machine learning algorithms may be necessary to improve performance on this dataset.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Using Classification Trees to Decide News Popularity XINYUE LIU
The News Popularity Data Set Goal: compare the performance of the two algorithms on the same dataset. o Source and Origin o Goal o Instances and Attributes o Examples
Data Description Significant Variables Selection
Decision Tree Iteratively split variables into groups Evaluate homogeneity within each group Split again if necessary Weekends Weekdays
Decision Tree Pros: - Easy to Interpret - Better performance in nonlinear settings Cons: - Without pruning can lead to overfitting - Uncertainty
Random Forest Bootstrap: random sampling with replacement Bootstrap samples and variables Grow multiple trees and average
Random Forest Pros: Pros: - Accuracy Cons: - Speed - Hard to interpret - Overfitting
Solution Random Forest Decision Tree Accuracy: 0.5056 Sensitivity : 0.4552 Specificity : 0.5273 Accuracy: 0.5068 Sensitivity : 0.4628 Specificity : 0.5339
Conclusion Both algorithms do not perform well on this dataset. The day the news published, the topic, and the number of images decide the popularity of the online news. Resort to other machine learning algorithms. Weekends Weekdays
References Trevor Hastie, Rob Tibshirani. Statistical Learning. Statistical Learning. Stanford University Online CourseWare, 21 August 2015. Lecture. http://online.stanford.edu/course/statistical-learning-winter-2014 Andrew Liu. Practical Machine Learning. Practical Machine Learning. Coursera, 21 August 2015. Lecture.