A data collection system for rumor detection

Date
2017
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
Nowadays, a lot of unsubstantiated and unverified information, named rumors, are created and propagated through the Internet because of the easiness of posting information online and lack of supervision. These rumors may cause users' confusion and social unrest. To prevent the negative influences, rumor detection which employs machine learning has been well studied. And almost all of these machine learning based methods rely on a large rumor dataset, which makes a large collection of rumor related data highly desired. However, current rumor collection methods are partially manual and usually specific for a single platform. ☐ In this thesis, we propose a rumor collection system to automatically collect rumor related data from both search engine and social media. It mainly consists of two parts. First, instead of using user input as the search query, a query generator is proposed to avoid directly using user input as the search query, which may result in the fail of search. It can generate a set of queries based on the user's input. After that, a novel rumor crawler is built to collect rumor related data by using the generated queries. ☐ To validate our rumor collection system, experiments are taken on the Tweets from January 2016 to March 2017. The result of 50 different rumors shows that, compared with current widely used Twitter Search API, our system can crawl more rumor with an average increasement of 3.589 times. Furthermore, for some rumors, our system is still effective when Twitter Search API returns no results.
Description
Keywords
Applied sciences, Keyword extraction, Rumor collection, Twitter crawler
Citation