OpenCitations: Preprocess

This software is meant to preprocess data dumps to be ingested in OpenCitations, provided by different data sources. The aim of the software is that of preprocessing data dumps in order to facilitate data parsing and extraction in OpenCitations Meta and OpenCitation Index processes. Note that preprocessing is not a mandatory step of data ingestion in OpenCitations. However, preprocessing is suggested when:

A consistent part of the bibliographic entities represented in the dump come without citation data
The dump content is redundant with respect to OpenCitations scopes (e.g.: duplicated citations retrievable both as addressed and received citations)
The dump consists of a unique big file, and it is too heavy to be processed all at once
A consistent part of the data provided is not relevant with respect to OpenCitations scopes (e.g.: discipline-specific and content-related metadata)

Mandatory

Python 3.8+

Start the tests

$ python -m unittest discover -s ./preprocessing/test -p "*.py"

License

OpenCitations Index is released under the ISC License.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github/workflows		.github/workflows
preprocessing		preprocessing
support_files		support_files
test		test
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenCitations: Preprocess

Mandatory

Start the tests

License

About

Releases

Packages

Languages

License

martasoricetti/preprocess

Folders and files

Latest commit

History

Repository files navigation

OpenCitations: Preprocess

Mandatory

Start the tests

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages