Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

include instructions on how to save metadata #14

Merged
merged 1 commit into from
Jan 11, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 29 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,15 @@
This repository contains code that is used as a runnable task in ECS. The
entry point [task/load.sh](task/load.sh) expects environment variables
are set for:

S3_BUCKET=some-bucket
S3_KEY=some-key

These provide a bucket and key to load data from. At the moment the keys are assumed to be sqlite files produced by
These provide a bucket and key to load data from. At the moment the keys are assumed to be sqlite files produced by
the digital land collection process.

The task is triggered by an S3 put object event tracked by AWS Cloudtrail which extracts event metadata for the S3
bucket and key name and provides those to this container as enviroment variables.
The task is triggered by an S3 put object event tracked by AWS Cloudtrail which extracts event metadata for the S3
bucket and key name and provides those to this container as enviroment variables.

The setup of this is in the [Tasks module](https://github.com/digital-land/digital-land-infrastructure/tree/main/terraform/modules/tasks)
of the digital-land-terraform repository.
Expand All @@ -26,10 +26,10 @@ To see how the values for bucket and key are extracted have a [look here](https:

- A running postgres server (tested with PostgreSQL 14)
- curl
- sqlite
- sqlite

The assumption is that the target digital_land db already has entity and dataset tables. In other words migrations
from the [digital-land.info](https://github.com/digital-land/digital-land.info) repository should have been run against
The assumption is that the target digital_land db already has entity and dataset tables. In other words migrations
from the [digital-land.info](https://github.com/digital-land/digital-land.info) repository should have been run against
the postgres database you want to load data into (the postgres database used by you locally running digital-land.info web
application)

Expand All @@ -46,20 +46,36 @@ To load the entity database change the S3_KEY to the correct key for the entity

cd into the task directory and run:

pip install -r requirements.txt
pip install -r requirements.txt

3. **Run the load script in task directory to load digital-land**

Remember the .env file is already set to load the digital-land db. However in order to load the db without using an aws account sign in you will need to use a different script

./load_local.sh

6. **Run the load script to load entity database**

Update the S3_KEY in the .env file to S3_KEY=entity-builder/dataset/entity.sqlite3

./load_local.sh

You'll notice that the load script downloads sqlite databases and creates csv files in the directory it runs from. These
files are git and docker ignored, so once done loading you can delete. It's a dumb script so each time you run it
files are git and docker ignored, so once done loading you can delete. It's a dumb script so each time you run it
the files get downloaded/created again.

## Adding the 'meta data' to the local database
this includes things like the datasets and other meta data that is not included in the collections

1. replace the env variable with digital land builder like so:
```
./load_local.sh
```
2. update env
```
source .env
```
3. run the load local sh
```
./load_local.sh
```