diff --git a/README.md b/README.md index 0d81a5b..7d182ac 100644 --- a/README.md +++ b/README.md @@ -5,15 +5,15 @@ This repository contains code that is used as a runnable task in ECS. The entry point [task/load.sh](task/load.sh) expects environment variables are set for: - + S3_BUCKET=some-bucket S3_KEY=some-key -These provide a bucket and key to load data from. At the moment the keys are assumed to be sqlite files produced by +These provide a bucket and key to load data from. At the moment the keys are assumed to be sqlite files produced by the digital land collection process. -The task is triggered by an S3 put object event tracked by AWS Cloudtrail which extracts event metadata for the S3 -bucket and key name and provides those to this container as enviroment variables. +The task is triggered by an S3 put object event tracked by AWS Cloudtrail which extracts event metadata for the S3 +bucket and key name and provides those to this container as enviroment variables. The setup of this is in the [Tasks module](https://github.com/digital-land/digital-land-infrastructure/tree/main/terraform/modules/tasks) of the digital-land-terraform repository. @@ -26,10 +26,10 @@ To see how the values for bucket and key are extracted have a [look here](https: - A running postgres server (tested with PostgreSQL 14) - curl - - sqlite + - sqlite -The assumption is that the target digital_land db already has entity and dataset tables. In other words migrations -from the [digital-land.info](https://github.com/digital-land/digital-land.info) repository should have been run against +The assumption is that the target digital_land db already has entity and dataset tables. In other words migrations +from the [digital-land.info](https://github.com/digital-land/digital-land.info) repository should have been run against the postgres database you want to load data into (the postgres database used by you locally running digital-land.info web application) @@ -46,20 +46,36 @@ To load the entity database change the S3_KEY to the correct key for the entity cd into the task directory and run: - pip install -r requirements.txt + pip install -r requirements.txt 3. **Run the load script in task directory to load digital-land** Remember the .env file is already set to load the digital-land db. However in order to load the db without using an aws account sign in you will need to use a different script - + ./load_local.sh - + 6. **Run the load script to load entity database** - + Update the S3_KEY in the .env file to S3_KEY=entity-builder/dataset/entity.sqlite3 ./load_local.sh - + You'll notice that the load script downloads sqlite databases and creates csv files in the directory it runs from. These -files are git and docker ignored, so once done loading you can delete. It's a dumb script so each time you run it +files are git and docker ignored, so once done loading you can delete. It's a dumb script so each time you run it the files get downloaded/created again. + +## Adding the 'meta data' to the local database +this includes things like the datasets and other meta data that is not included in the collections + +1. replace the env variable with digital land builder like so: +``` + ./load_local.sh +``` +2. update env +``` + source .env +``` +3. run the load local sh +``` + ./load_local.sh +```