Build an ML system to verify the veracity of claims.
PUBHEALTH dataset has an associated veracity label (true, false, unproven, mixture). Each instance in the dataset has an explanation text field. The explanation is a justification for which the claim has been assigned a particular veracity label.
source: https://huggingface.co/datasets/health_fact
- BERT_fact_checker.ipynb : describes the steps of implementation
- src/ bertClassifier.py : contains class and functions to initialize and train the BERT model
- transformers
- datasets
- sentence_transformers
- umap-learn
import sklearn
from transformers import *
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report
import torch
from transformers import PegasusForConditionalGeneration, PegasusTokenizer
from src.bertClassifier import *
- Load Data
- Preprocess Data
- Build the Model (BERT)
- Predict & Evaluate (63% Acc.)
- Data Augmentation + Predict & Evaluate (65% Acc.)
- Issues for consideration
- ANNEX - Data visualization
Please see 'BERT_fact_checker.ipynb' for details.