A data collection system for rumor detection

Author(s)Wang, Ye
Date Accessioned2018-05-08T12:28:15Z
Date Available2018-05-08T12:28:15Z
Publication Date2017
SWORD Update2017-11-10T14:19:43Z
AbstractNowadays, a lot of unsubstantiated and unverified information, named rumors, are created and propagated through the Internet because of the easiness of posting information online and lack of supervision. These rumors may cause users' confusion and social unrest. To prevent the negative influences, rumor detection which employs machine learning has been well studied. And almost all of these machine learning based methods rely on a large rumor dataset, which makes a large collection of rumor related data highly desired. However, current rumor collection methods are partially manual and usually specific for a single platform. ☐ In this thesis, we propose a rumor collection system to automatically collect rumor related data from both search engine and social media. It mainly consists of two parts. First, instead of using user input as the search query, a query generator is proposed to avoid directly using user input as the search query, which may result in the fail of search. It can generate a set of queries based on the user's input. After that, a novel rumor crawler is built to collect rumor related data by using the generated queries. ☐ To validate our rumor collection system, experiments are taken on the Tweets from January 2016 to March 2017. The result of 50 different rumors shows that, compared with current widely used Twitter Search API, our system can crawl more rumor with an average increasement of 3.589 times. Furthermore, for some rumors, our system is still effective when Twitter Search API returns no results.en_US
AdvisorFang, Hui
DegreeM.S.
DepartmentUniversity of Delaware, Department of Electrical and Computer Engineering
Unique Identifier1034985436
URLhttp://udspace.udel.edu/handle/19716/23144
Languageen
PublisherUniversity of Delawareen_US
URIhttps://search.proquest.com/docview/2013189281?accountid=10457
KeywordsApplied sciencesen_US
KeywordsKeyword extractionen_US
KeywordsRumor collectionen_US
KeywordsTwitter crawleren_US
TitleA data collection system for rumor detectionen_US
TypeThesisen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Wang_udel_0060M_13017.pdf
Size:
1.24 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.22 KB
Format:
Item-specific license agreed upon to submission
Description: