
Identifying Bot Accounts in Twitter Using Machine Learning Approaches
Detect and combat the prevalence of bot accounts on Twitter through the application of machine learning techniques, with the aim of safeguarding genuine users from misinformation and malicious activities. Learn about the motivation, earlier research, research improvement targets, methodology, available public datasets, and relevant references in this field. Explore the use of Random Forest, Support Vector Machine, and Artificial Neural Network models in bot detection.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Identifying Bot Accounts in Twitter Using Machine Learning Approaches SUPERVISOR DR. E. Y. A. CHARLES Ajaya Karki 2015/CSC/FS/040
Motivation More than 50% of the activity on Twitter comes from bots algorithmically-automated accounts created to advertise products, distribute spam, or sway public opinion, which makes people financially or mentally vulnerable and also has adverse effect on election campaign. So, detecting bots is necessary in order to identify bad actors in the Twitter-verse and protect genuine users from misinformation and malicious intents.
Earlier Work Research has been done on this field previously using Neural Network along with GBM and Logistic Regression where NN got the best F-score, the GBM model had a greater AUPRC and better accuracy for its most confident positive prediction. And Gradient Boosting Classifier is a state of art model for this identification model.
Research Improvement Target Since they have the gap to be filled for the research they have done as it was completed with just collecting data like user statistics(verification status, followers count, favorites count) and tweet statistics(No. of mention per tweet, No. of hashtags etc.) we can still improve this using text based features. We can explore the tweet sentiment, topic extraction, words used and further help to improve the prediction accuracy. For this which technique would be better suitable this need to be identified. We can improve the Decision Model.
Methodology Random Forest Support Vector Machine (SVM) Artificial Neural Network (ANN)
Available Public Data Sets From the reference of previous research, I have got updated public datasets Botrepository hosted by Indiana University. It has several data regarding various aspects which will be helpful for conducting test in wide range. It has information and tweets for nearly 6000 accounts collected between 2009 and 2018
References Sahil Nayyar, Jessica Wetstone. I Spot a Bot: Building a binary classifier to detect bots on Twitter. 2017. Varol, Onur, Emilio Ferrara, Clayton A. Davis, Filippo Menczer, and Alessandro Flammini. "Online Human-Bot Interactions: Characterization." ICWSM, 2017. Detection, Estimation, and Gorwa, Robert. Twitter has a bot problem and Wikipedia might be the solution . Quartz Media, 2017.