Given the constantly growing proliferation of false claims online in recent years, there has been also a growing research interest in automatically distinguishing false rumors from factually-true claims. Here, we propose a general-purpose framework for fully-automatic fact checking using external sources, tapping the potential of the entire Web as a knowledge source to confirm or reject a claim. Our framework uses a deep neural network with LSTM text encoding to combine semantic kernels with task-specific embeddings that encode a claim together with pieces of potentially-relevant text fragments from the Web, taking the source reliability into account. The evaluation results show good performance on two different tasks and datasets:
- rumor detection and
- fact checking of the answers to a question in community question answering forums.
Georgi Karadzhov, Preslav Nakov, Lluís Màrquez, Alberto Barrón-Cedeño, Ivan Koychev
Paper link: https://arxiv.org/abs/1710.00341
Please, cite the following paper if you use the resources below:
@InProceedings{RANLP2017:factchecking,
author = {Georgi Karadzhov and Preslav Nakov and Llu\'{i}s M\`{a}rquez and Alberto Barr\'on-Cede\~no and Ivan Koychev},
title = {Fully Automated Fact Checking Using External Sources},
booktitle = {Proceedings of the 2017 International Conference on Recent Advances in Natural Language Processing},
month = {September},
year = {2017},
address = {Varna, Bulgaria},
series = {RANLP~'17}
}
Version of the code is available in the repo. Cleaner(refactored) version of the code will be available soon(-ish).
Name | Short description | Link |
---|---|---|
Claims | Claims from snopes.com, each of them is labeled with Rumour or Non-rumour. | Download |
Data splits | Exact splits used for training and evaluation of factchecking system | Train-Download Test-Download Development-Download |
Website credibility | Manually annotated list of websites. Possible labels(reputed-source, forum-type, others) | Download |
Web data | Each claim augmented with automaticaly collected web data. Also includes all calculated similarities and avg. sentence vectors. | Download |
Best web resources | Only web data, that has the highest similarity to the original claim. This is used to train the task-specific embeddings. | Download |
Task-specific embeddings | Combined representation of a claim and the supporting web data. | Download |
Name | Short description | Link |
---|---|---|
QAs | Question and comments, from QatarLiving forum. | Download |
QAs-concatenated | Question and comments, from QatarLiving forum, concatenated to represent a single entity. This data is used in the system. | Download |
Data splits | Exact splits used for training and evaluation of factchecking system | Train-Download Test-Download Development-Download |
Website credibility | Manually annotated list of websites. Possible labels(reputed-source, forum-type, others). For cQA dataset we also annotated wheter the website is Qatar related as it is relevant to the credibility of an answer | Download |
Web data | Each QA-pair augmented with automaticaly collected web data. Also includes all calculated similarities and avg. sentence vectors. | Download |
Best web resources | Only web data, that has the highest similarity to the original QA-pair. This is used to train the task-specific embeddings. | Download |
Task-specific embeddings | Combined representation of a QA-pair and the supporting web data. | Download |