Intake plugin for HTML tables.
pip install -e git+https://github.com/compilerla/intake-html-table@main#egg=intake-html-table
Or
git clone https://github.com/compilerla/intake-html-table
cd intake-html-table/
pip install -e .
See examples/notebook.ipynb or view on nbviewer for more.
From an intake
catalog
Use the html_table
driver to read data from HTML tables. Pass additional kwargs to pandas.read_html()
:
metadata:
version: 1
sources:
table:
description: Read from an HTML table with id=data, skipping the first 2 rows
driver: html_table
args:
urlpath: "https://example.com/"
attr:
id: data
skiprows: 2
Use the apache_dir
driver to read a catalog from an Apache Server directory:
metadata:
version: 1
sources:
ncei:
description: National Centers for Environmental Information data catalog
driver: apache_dir
args:
urlpath: "https://www.ncei.noaa.gov/data/"
Run the test suite (from the root of the repository):
coverage run -m pytest
To view the coverage
report with indicators for untested (missed) lines:
coverage report -m
To upate the README badge from the latest test run:
coverage-badge -f -o tests/coverage.svg
The -f
argument ensures the existing badge is overwritten.
Tests also run via GitHub Action on events against the main
branch.