An information system for rumors checking

Date
2018
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
The rapid development of the Internet has already helped the social media become a significant player as sources for news. However, due to the lack of supervision, social media is also becoming the fertile land for the spread of malicious rumors, which primarily emerges during breaking news. The malicious damage they do to individuals and society is enormous when they spread online. This thesis develops an information system for checking rumors. The system could automatically extract candidate rumors from tweets, and the average distance between extracted candidate rumors and target claims is 0.37. By leveraging the stance classification method, our system could use an alternative way that utilizes the stances of claims on candidate rumors from different information to help users to check rumors. Experiment results show that this method could get the same results on the Snopes website in the most cases. The extraction of claims is implemented through parsing tweets based on dependency parser for tweets, merging similar claims into same groups by using clustering methods, and selecting representative claims from groups as candidate rumors based on proposed features. The stance classifier used in this thesis is proposed by Augenstein et al.. It was state-of-the-art stance classification among the SemEval 2016 Task 6. ☐ To evaluate our rumor exploration system, we tested it on thirty-one events representing about 84,297 tweets in total. Twenty two events of them are selected from Snopes website. Our system collects tweets of these twenty-two events. Besides, nine events of total events are from PHEME dataset. The data of these nine events had been collected and labeled by the journalists. Among twenty-two events from the Snopes website, our system can precisely extract the meaningful claims embedded in tweets of twelve events with the average distance of 0.37 to claims shown on the Snopes website. Besides, the meaningful claims of eight out of nine events in the PHEME dataset are extracted. Thus, among the thirty-one events, meaningful claims of twenty events are retrieved by our system. Moreover, most of the results about candidate rumors inferred from our system are as same as those present on the Snopes website and those labeled in the PHEME dataset. Based on the evaluations of twenty events, the ability of our system is competitive with that of the Snopes website, which contents generated by professional persons.
Description
Keywords
Communication and the arts, Applied sciences, Claim extraction, Rumor checking, Stance classification, Tweets Crawler
Citation