National Statistical Offices Statistics Fetcher

Fetches and cleans data from NSO websites and publishes them as in a standardised tidy data format.

This work has two goals

Provide a database of well-formatted data that can be used in Full Fact’s Stats Checking tools.
To highlight how much work is involved to collect and compare national statistics data across countries, as discussed in the write-up.

The data files follows a simple timescale,observation format. Time is YYYY-MM, and observation is percentage change. For example:

month,observation
1996-01,47.56
1996-02,43.645
1996-03,41.9048
...

These are the statistics that are fetched, reformatted and stored in the ./data directory:

Argentina
- Consumer price index – monthly year-on-year (source)
Ireland
- Consumer price index – monthly year-on-year (source)
Japan
- Consumer price index – monthly year-on-year (source)
Mexico
- Consumer price index – monthly year-on-year (source)
Nigeria
- Consumer price index – monthly year-on-year (source)
Philippines
- Consumer price index – monthly year-on-year (source)
UK
South Africa
- Consumer price index - monthly year-on-year (source)
- Producer price index - monthly year-on-year (source)

In almost all cases the data file is downloaded and read in (except for Philippines where the numbers were hard-coded). Preferably the files would be JSON or a CSV, but some countries have PDFs or XLS files. The location of all these files online and other metadata is in the data/nso_stats_metadata.json file.

It is also deployed as a Github action which runs several times between 6am and 10am UTC. So some of the statistics should stay up-to-date. You can view this Github action in .github/workflow/fetch_stats.yaml. However, given the variability of these statistics data, it wouldn't be surprising if the action breaks at some point if the published format changes.

Dependenices

Java 8+ (for Tabula to read PDFs)
Python 3.10+
- It likely works for older versions of Python, but it hasn't been tested

Setup

Clone this repo

git clone https://github.com/FullFact/nso-stats-fetcher.git

Install required libraries

Either

poetry install

or

pip install -r requirements.txt

To run the scripts and fetch updated versions of all the statistics data, run:

python src/nsofetch/fetch_all.py

Or just run each country's individual script individually. We use ISO 3166 country codes for standardised country names.

Name		Name	Last commit message	Last commit date
Latest commit History 4,799 Commits
.github/workflows		.github/workflows
data		data
src/nsofetch		src/nsofetch
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
analysis.md		analysis.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
status.txt		status.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

National Statistical Offices Statistics Fetcher

Dependenices

Setup

About

Contributors 6

Languages

License

FullFact/nso-stats-fetcher

Folders and files

Latest commit

History

Repository files navigation

National Statistical Offices Statistics Fetcher

Dependenices

Setup

About

Resources

License

Stars

Watchers

Forks

Contributors 6

Languages