Skip to content

Commit

Permalink
update doc for funder consolidation
Browse files Browse the repository at this point in the history
  • Loading branch information
kermitt2 committed Aug 26, 2023
1 parent aa4d3fa commit c384ff1
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 5 deletions.
2 changes: 2 additions & 0 deletions doc/Consolidation.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ Consolidation has two main interests:

* The consolidation service matches the extracted bibliographical references with known publications, and complement the parsed bibliographical references with various metadata, in particular DOI, making possible the creation of a citation graph and to link the extracted references to external services.

The consolidation includes the CrossRef Funder Registry for enriching the extracted funder information.

GROBID supports two consolidation services:

* [CrossRef REST API](https://github.com/CrossRef/rest-api-doc) (default)
Expand Down
11 changes: 6 additions & 5 deletions doc/Grobid-service.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,24 +111,24 @@ Still to demostrate [PDF.js] annotation possibilities, by default bibliographica

We describe bellow the provided resources corresponding to the HTTP verbs, to use the grobid web services. All url described bellow are relative path, the root url is `http://<server instance name>/<root context>`

The consolidation parameters (`consolidateHeader` and `consolidateCitations`) indicate if GROBID should try to complete the extracted metadata with an additional external call to [CrossRef API](https://github.com/CrossRef/rest-api-doc). The CrossRef look-up is realized based on the reliable subset of extracted metadata which are supported by this API. Each consolidation parameter is a string which can have three values:
The consolidation parameters (`consolidateHeader`, `consolidateCitations`, `consolidateFunders`) indicate if GROBID should try to complete the extracted metadata with an additional external call to [CrossRef API](https://github.com/CrossRef/rest-api-doc) or [biblio-glutton](https://github.com/kermitt2/biblio-glutton). The CrossRef and biblio-glutton look-up are realized based on the reliable subset of extracted metadata which are supported by these API. Each consolidation parameter is a string which can have three values:

* `0`, means no consolidation at all is performed: all the metadata will come from the source PDF
* `1`, means consolidation against CrossRef and update of metadata: when we have a DOI match, the publisher metadata are combined with the metadata extracted from the PDF, possibly correcting them
* `2`, means consolidation against CrossRef and, if matching, addition of the DOI only
* `1`, means consolidation against CrossRef/biblio-glutton and update of metadata: when we have a DOI match, the publisher metadata are combined with the metadata extracted from the PDF, possibly correcting them
* `2`, means consolidation against CrossRef/biblio-glutton and, if matching, addition of the DOI only

### PDF to TEI conversion services

#### /api/processHeaderDocument

Extract the header of the input PDF document, normalize it and convert it into a TEI XML or [BibTeX] format.

`consolidateHeader` is a string of value `0` (no consolidation), `1` (consolidate and inject all extra metadata, default value), or `2` (consolidate the citation and inject DOI only).
`consolidateHeader` is a string of value `0` (no consolidation), `1` (consolidate and inject all extra metadata, default value), or `2` (consolidate the header metadata and inject DOI only).

| method | request type | response type | parameters | requirement | description |
|--- |--- |--- |--- |--- |--- |
| POST, PUT | `multipart/form-data` | `application/xml` | `input` | required | PDF file to be processed |
| | | | `consolidateHeader` | optional | consolidateHeader is a string of value `0` (no consolidation), `1` (consolidate and inject all extra metadata, default value), `2` (consolidate the citation and inject DOI only), or `3` (consolidate using only extracted DOI - if extracted) . |
| | | | `consolidateHeader` | optional | consolidateHeader is a string of value `0` (no consolidation), `1` (consolidate and inject all extra metadata, default value), `2` (consolidate the header and inject DOI only), or `3` (consolidate using only extracted DOI - if extracted) . |
| | | | `includeRawAffiliations` | optional | `includeRawAffiliations` is a boolean value, `0` (default, do not include raw affiliation string in the result) or `1` (include raw affiliation string in the result). |

Use `Accept: application/x-bibtex` to retrieve BibTeX format instead of TEI (note: the TEI XML format is much richer, it should be preferred if there is no particular reason to use BibTeX).
Expand Down Expand Up @@ -166,6 +166,7 @@ Convert the complete input document into TEI XML format (header, body and biblio
| POST, PUT | `multipart/form-data` | `application/xml` | `input` | required | PDF file to be processed |
| | | | `consolidateHeader` | optional | `consolidateHeader` is a string of value `0` (no consolidation), `1` (consolidate and inject all extra metadata, default value), `2` (consolidate the citation and inject DOI only), or `3` (consolidate using only extracted DOI - if extracted). |
| | | | `consolidateCitations` | optional | `consolidateCitations` is a string of value `0` (no consolidation, default value) or `1` (consolidate and inject all extra metadata), or `2` (consolidate the citation and inject DOI only). |
| | | | `consolidatFunders` | optional | `consolidateFunders` is a string of value `0` (no consolidation, default value) or `1` (consolidate and inject all extra metadata), or `2` (consolidate the funder and inject DOI only). |
| | | | `includeRawCitations` | optional | `includeRawCitations` is a boolean value, `0` (default, do not include raw reference string in the result) or `1` (include raw reference string in the result). |
| | | | `includeRawAffiliations` | optional | `includeRawAffiliations` is a boolean value, `0` (default, do not include raw affiliation string in the result) or `1` (include raw affiliation string in the result). |
| | | | `teiCoordinates` | optional | list of element names for which coordinates in the PDF document have to be added, see [Coordinates of structures in the original PDF](Coordinates-in-PDF.md) for more details |
Expand Down

0 comments on commit c384ff1

Please sign in to comment.