Relevant Scientific Evidence Retrieval and Verification API for Climate impact related queries

Overview

The API is created to perform scientific verification of claims extracted from Climate Change related news articles in order to detect potential inaccuracies of the latter.

General workflow

flowchart TB
   subgraph client1 [Streamlit Client]
      A(Media Article text or URL) --> COR{"Co-reference resolution \n (Optional)"}
      COR -->|Split into sentences| S1("Sentence 1")
      COR -->|Split into sentences| S2("Sentence 2")
      COR -->|Split into sentences| SN("...")
      S1 --> CR{"Climate related?\n (Optional)"}
      S2 --> CR
      SN --> CR
      CR -->|Yes| IC{"Is a claim? \n(Optional)"}
      CR --No --x N[ Ignore ]
      IC --No --x N1[Ignore]
   end
   subgraph API
      IC -- Yes ---> E["Retrieve Top k most similar evidences"]
      E:::curAppNode --> R["Re-rank using citation metrics (Optional)"]
      R:::curAppNode --> VC[["Verify with Climate-BERT based model"]]
   end
   subgraph client2 [ Streamlit Client ]
      R ---> VM[["Verify with MultiVerS"]]
      VC:::curAppNode --> D["Display predictions"]
      VM --> D
   end
    style R stroke:#808080,stroke-width:2px,stroke-dasharray: 5 5
    style CR stroke:#808080,stroke-width:2px,stroke-dasharray: 5 5
    style COR stroke:#808080,stroke-width:2px,stroke-dasharray: 5 5
    style IC stroke:#808080,stroke-width:2px,stroke-dasharray: 5 5
    style API fill:#E9EAE0,color:#E7625F
    classDef curAppNode fill:#F7BEC0,color:#C85250,stroke:#E7625F
    linkStyle 10,11 stroke:#F7BEC0,stroke-width:4px,color:#C85250,background-color:#F7BEC0
;

Main functionality

The API performs 2 main tasks

Evidence retrieval for given claim(s) under all evidence endpoints
Evidence retrieval + verification for given claim(s) under all vefify endpoints
Supplementary task of splitting text into sentences under split endpoint to enable Chrome extension functioning

Scientific evidences index and database

Please refer to the dedicated Evidence database creation section

Most relevant evidence retrieval

Performed using Haystack framework that facilitates fast dense vector retrieval and the sentence encoder models described here The number of evidence candidates to retrieve is defined by the top_k parameter

Evidence Re-ranking

If the re_rank parameter is set to true the following actions are performed

5 more than the set top_k most semantically similar evidence candidates get retrieved for each input claim.
The candidate evidences get sorted in descending order according to the following parameters:
1. Number of influential citations i.e., citations that indicate that the cited work is used or extended in the new effort [1]
2. Number of all citations
3. Publication year

Model for claim verification against retrieved evidence

Climatebert-fact-checking model available from huggingface. It's a ClimateBERT [2] model fine-tuned on CLIMATE-FEVER dataset [3]

Split into sentences

Spacy "en_core_web_sm" pipeline is used for text segmentation task
This model is the smallest and the fastest and according to spacy's Accuracy Evaluation has the same metric values as the bigger CPU-optimized models

API description

In all the examples below batch endpoint accepts an array of input sentences rather than a single sentence

`abstract` Endpoints

All the following endpoints perform searches against the database with the full scientific article abstracts

/api/abstract/evidence
/api/abstract/evidence/batch

/api/abstract/verify
/api/abstract/verify/batch

`phrase` Endpoints

All the following endpoints perform searches against the database with the scientific article abstracts broken into individual phrases

/api/phrase/evidence
/api/phrase/evidence/batch

/api/phrase/verify
/api/phrase/verify/batch

Formal description of the API

Local development and deployment

Please refer to the Technical documentation

References

Valenzuela-Escarcega, M.A., Ha, V.A., & Etzioni, O. (2015). Identifying Meaningful Citations. AAAI Workshop: Scholarly Big Data.
Webersinke, N., Kraus, M., Bingler, J. A., & Leippold, M. (2021). Climatebert: A pretrained language model for climate-related text. arXiv preprint arXiv:2110.12010.
Diggelmann, Thomas; Boyd-Graber, Jordan; Bulian, Jannis; Ciaramita, Massimiliano; Leippold, Markus (2020). CLIMATE-FEVER: A Dataset for Verification of Real-World Climate Claims. In: Tackling Climate Change with Machine Learning workshop at NeurIPS 2020, Online, 11 December 2020 - 11 December 2020.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
data		data
doc		doc
.dockerignore		.dockerignore
.gitignore		.gitignore
CITATION.cff		CITATION.cff
Dockerfile		Dockerfile
README.md		README.md
api.gif		api.gif
evaluator_service.py		evaluator_service.py
evidence_api.py		evidence_api.py
requirements.txt		requirements.txt
retriever.py		retriever.py
retriever_service.py		retriever_service.py
schemas.py		schemas.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Relevant Scientific Evidence Retrieval and Verification API for Climate impact related queries

Overview

General workflow

Main functionality

Scientific evidences index and database

Most relevant evidence retrieval

Evidence Re-ranking

Model for claim verification against retrieved evidence

Split into sentences

API description

`abstract` Endpoints

`phrase` Endpoints

Local development and deployment

References

About

Releases

Packages

Languages

aaalexlit/cc-evidences-api

Folders and files

Latest commit

History

Repository files navigation

Relevant Scientific Evidence Retrieval and Verification API for Climate impact related queries

Overview

General workflow

Main functionality

Scientific evidences index and database

Most relevant evidence retrieval

Evidence Re-ranking

Model for claim verification against retrieved evidence

Split into sentences

API description

abstract Endpoints

phrase Endpoints

Local development and deployment

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`abstract` Endpoints

`phrase` Endpoints

Packages