Scrape and harvest collections / sources P4 (OELA) #137

osahon-okungbowa · 2020-05-15T16:16:28Z

Depends on #114

Description

Harvesting collections and sources depends on the schema validation allowing groups of different types and package relationships in the data.json source files.

Proposed Spec For Implementation is located here: Specs For Implementing data.json Validation Schema for Dept of Ed

Format

uses same format outlined in #115

Scraping rules:

If a HTML page contains multiple datasets -> extract the page itself as a collection
If a HTML page contains no datasets, but it has multiple links to pages that are collections -> extract the page as a source

CKAN extensions updates:

The datajson extension needs Collection / Source processing capabilities based on the data it finds in the data.json file.

SITUATION

Based on the format And scraping rules specified above, the OELA office requires an implementation of collection and sources

Tasks

Implement the scraping output changes
Implement the scraping rules for Collection / Source
Add the new items to the datajson schema we are using
Load a datajson containing collections and sources into a harvester source and test

Acceptance criteria:

sources and collections are implemented for the OELA office and is visible on staging portal

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scrape and harvest collections / sources P4 (OELA) #137

Scrape and harvest collections / sources P4 (OELA) #137

osahon-okungbowa commented May 15, 2020

Scrape and harvest collections / sources P4 (OELA) #137

Scrape and harvest collections / sources P4 (OELA) #137

Comments

osahon-okungbowa commented May 15, 2020

Description

SITUATION

Tasks

Acceptance criteria: