Extraction of Features for Context-Aware Rumor Source Detection

context aware rumor source detection n.w
1 / 24
Embed
Share

Explore the process of extracting features for context-aware rumor detection, focusing on using datasets from Twitter and Weibo. Learn about dataset selection, model learning, data visualization, website construction, and future work in this research domain.

  • Rumor detection
  • Feature extraction
  • Context-aware
  • Dataset analysis
  • Data visualization

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Context-aware Rumor Source Detection

  2. Contents Dataset and feature extraction 1 Model learning 2 Data visualization 3 Website construction 4 Future work 5

  3. Contents Dataset and feature extraction 1 Model learning 2 Data visualization 3 Website construction 4 Future work 5

  4. DatasetTwitter or Weibo? Most of papers focus on tweets of Twitter, so the study of Twitter is more detailed than Weibo.Some features of Twitter data may not perform well on Weibo data The dataset of Twitter is usually smaller than Weibo. Twitter only allows users to crawl tweets within seven days, while there are less restriction in Weibo Due to the terms of use of Twitter data, the researcher cannot release the Twitter data. So it is hard to get the large dataset from antecede In general, we choose weibo data as a research object.

  5. Dataset We used the Weibo dataset from the article Detecting Rumors from Microblogs with Recurrent Neural Networks [1].This corpus contains 4664 labeled events in total. For each event, it records some blog information and account information about origin event message and all posts.

  6. Feature Extraction In the process of feature extraction, we not only use the features of the data set, but also use the WeiBo" API and python crawler tools to obtain more relevant data features Weibo API has strict permission management, so we use the blog positioning function in API, and then use crawler to crawl more information of corresponding blog. .

  7. Feature Extraction Blog Feature Repost number Context keywords Context emotion words(based on emtion dictionary) Attitude towards the posted blog(Support,deny,evidence or others?) Comments number Favorite number Whether it contains a website? Create time and send time The emojis

  8. Feature ExtractionAccount Feature User name User description User avatar(whether a definite avatar) User location Bi-followers number, following number and followers number Whether the account is verified? Which type of verification? Friends number Statuses number

  9. Contents Dataset and feature extraction 1 Model learning 2 Data visualization 3 Website construction 4 Future work 5

  10. Ensemble Learning Vote Classification This problem is a 2-class classfication problem, which has many tools to solve,for instance, SVM, decisive tree and logistic regression. Ensemble learning could combine various models and learn from their own advantages. Vote classification is a frame of ensemble learning and consists of two types: Hard voting: each classification model outputs the label and choose label according the vote number Soft voting: each classification model outputs the probability of each label and the overall probability is the mean value of all classfication models. Soft voting with weight : assign each classification with different weights

  11. Experiments via different models Accuracy Recall of rumor 0.9206 0.92 Decisive tree 0.913 SVM 1.00 0.9223 0.99 Random Forest 0.9078 0.95 KNN 0.9228 0.95 Logistic Regression 0.9137 0.96 Bayes 0.9314 0.90 Hard voting 0.9265 0.97 Soft voting 0.97 Soft voting-weight 0.9410

  12. Transfer learning The aim of transfer learning is to make the classification via source domain also works well on target domain(which has no label). In this task, we choose TCA(Transfer Component Analysis) as the algorithm to execute the task of domain adaptation. The core idea is that to project the data in source domain and target domain into a similar subspace. Finally, we turn TCA to a optimal problem:

  13. Source Detection For the structure of dataset is many trees contructed by post relation, the conventional rumor source detection method is unsuitable anymore We think the source of the rumor is either A blog which appear first A webiste from which the report comes Algorithm: Algorithm: a) Extract the keywords from a blog identified as a rumor b) Search the keywords in the dataset c) Collect the websites in the search result d) Find the blog or website which appear first e) Stop when there is no search result

  14. Contents Dataset and feature extraction 1 Model learning 2 Data visualization 3 Website construction 4 5 Future work

  15. Decisive Tree

  16. The distribution of accounts For account feature, we use t-SNE to represent the spatial relationship.From the image,the highest risk accounts form a cluster. blue green red cyan magenta yellow, the risk degree rises

  17. The appearances of websites purple blue cyan aqua green yellow, the rumor ratio rises

  18. Contents Dataset and feature extraction 1 Model learning 2 Data visualization 3 Website construction 4 5 Future work

  19. Front and back of website front front- -end CSS, and some dynamic parts use JavaScript code. end The front end of the website is HTML, rendering with back back- -end end The back end uses the Django framework. The search bar on the home page can be enlarged to the full screen by clicking (realized by using JavaScript), and some common rumors are displayed and classified. Click a rumor to jump to the corresponding rumor blog page.

  20. Front and back of website The results are divided into two pages, one is the home page for search, and the other is the result page for displaying the detection information. The results page shows the rumor rate, likes, forwards and comments of the searched blog. At the same time, it also supports users to judge whether the blog post is rumor or or not. And automatically jump to the main page interface after judgment

  21. Contents Dataset and feature extraction 1 Model learning 2 Data visualization 3 Website construction 4 5 Future work

  22. Future work 1. At present we tackle this problem with a macro perspective, i.e., we regard the origin blog and its repost blogs as an entity. We may study on the evolution during propagation. A graph model such as CRF may be more suitable. 2. Now our rumor source detection is highly related to weibo and we want to generalize the model 3. We consider improving the whole website, such as the rumor classification area, rumor real-time scroll bar, etc.

  23. Reference [1]Detecting Rumors from Microblogs with Recurrent Neural Networks [2]Automatic detection of Rumoured Tweets and finding its Origin [3]The DARPA TWITTER BOT CHALLENGE [4]Examining the Alternative Media Ecosystem through the Production of Alternative Narratives of Mass Shooting Events on Twitter

Related


More Related Content