In this repository, the following research question is explored: what is the performance of combinations of pre-trained embedding techniques with machine learning algorithms when classifying fake news? This research will be focussed on applying transfer learning on earlier research by Wang (2017). Results of Wang will be used as a benchmark for performance.
To run the code in the code
folder, the following packages must be installed:
flair
allennlp
tensorflow
tensorflow_hub
pytorch
spacy
hypopt
gensim
You can install these packages by running pip install -r /code/requirements.txt
.
At what padding sequence length do neural networks hold the highest accuracy when classifying fake news?
How well do neural network classification architectures classify fake news compared to non-neural classification algorithms?
With a combination of BERT embeddings and a logistic regression, an accuracy of 52.96% on 3 labels can be achieved, which is an increase of almost 4% compared to previous research in which only traditional linguistic methods were used. On the original 6 labels, this combination achieves an accuracy of 27.51%, which is 0.51% better than the original research by Wang (2017).