Using Classification Trees for News Popularity Analysis

using classification trees to decide news n.w

1 / 11

Embed Share

Discover how decision trees and random forests are utilized to determine news popularity by comparing algorithm performance on the same dataset. Explore the method of iteratively splitting variables to evaluate homogeneity and key factors influencing online news popularity. Consider the strengths and limitations of decision trees and random forests, and explore their accuracy, sensitivity, and specificity in this context. Find out why resorting to other machine learning algorithms may be necessary to improve performance on this dataset.

jveron Follow

Uploaded on Apr 03, 2025 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Using Classification Trees to Decide News Popularity XINYUE LIU

The News Popularity Data Set Goal: compare the performance of the two algorithms on the same dataset. o Source and Origin o Goal o Instances and Attributes o Examples

Data Description Significant Variables Selection

Decision Tree Iteratively split variables into groups Evaluate homogeneity within each group Split again if necessary Weekends Weekdays

Decision Tree Pros: - Easy to Interpret - Better performance in nonlinear settings Cons: - Without pruning can lead to overfitting - Uncertainty

Random Forest Bootstrap: random sampling with replacement Bootstrap samples and variables Grow multiple trees and average

Random Forest Pros: Pros: - Accuracy Cons: - Speed - Hard to interpret - Overfitting

Solution Random Forest Decision Tree Accuracy: 0.5056 Sensitivity : 0.4552 Specificity : 0.5273 Accuracy: 0.5068 Sensitivity : 0.4628 Specificity : 0.5339

Conclusion Both algorithms do not perform well on this dataset. The day the news published, the topic, and the number of images decide the popularity of the online news. Resort to other machine learning algorithms. Weekends Weekdays

References Trevor Hastie, Rob Tibshirani. Statistical Learning. Statistical Learning. Stanford University Online CourseWare, 21 August 2015. Lecture. http://online.stanford.edu/course/statistical-learning-winter-2014 Andrew Liu. Practical Machine Learning. Practical Machine Learning. Coursera, 21 August 2015. Lecture.

Thank you!

Using Classification Trees for News Popularity Analysis

Download Presentation

Presentation Transcript

Related

More Related Content