Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 0.1.0 #70

Merged
merged 46 commits into from
Jul 19, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
514449c
Clean generate notebook
ivanzvonkov Jul 15, 2022
72a073d
Generate requirements file also
ivanzvonkov Jul 16, 2022
50eab95
Generate notebook entirely creates repo
ivanzvonkov Jul 16, 2022
7b40c2a
Upgrade test project
ivanzvonkov Jul 16, 2022
bdf9717
Skip if certain files don't exist
ivanzvonkov Jul 16, 2022
acc2407
Check for unused features
ivanzvonkov Jul 18, 2022
044d7ed
Update version
ivanzvonkov Jul 18, 2022
57601d8
Auto generate description
ivanzvonkov Jul 18, 2022
b7c049a
Regenerate project with 0.0.3
ivanzvonkov Jul 18, 2022
8b1dad1
Make test actually fail
ivanzvonkov Jul 18, 2022
53cfd67
remove unused import
ivanzvonkov Jul 18, 2022
7addbfe
Remove unused notebook
ivanzvonkov Jul 18, 2022
84e63d0
Move features into csv
ivanzvonkov Jul 19, 2022
b34c1ed
Regenerate crop-mask project
ivanzvonkov Jul 19, 2022
cfac6e4
Add new datasets to dvc
ivanzvonkov Jul 19, 2022
324638d
Messed up copy and paste
ivanzvonkov Jul 19, 2022
2a8a4ed
formatting
ivanzvonkov Jul 19, 2022
f2a7974
fix datapath
ivanzvonkov Jul 19, 2022
9921600
Correct crop-mask-example bucket
ivanzvonkov Jul 19, 2022
ffbb72d
Regenerate buildings-example
ivanzvonkov Jul 19, 2022
a5a6f3c
Rename report
ivanzvonkov Jul 19, 2022
4d6ed0c
Upgrade maize-example
ivanzvonkov Jul 19, 2022
50c4a4e
Update buildings dataset
ivanzvonkov Jul 19, 2022
8503cf2
Tutorial uses 0.1.0
ivanzvonkov Jul 19, 2022
3a96fd9
Write report
ivanzvonkov Jul 19, 2022
78a1bbf
ensure status is included
ivanzvonkov Jul 19, 2022
35f4797
pin openmapflow version
ivanzvonkov Jul 19, 2022
5d06459
Remove features naming
ivanzvonkov Jul 19, 2022
85d3411
Update duse create_datasets
ivanzvonkov Jul 19, 2022
59d7488
Ensure order stays the same
ivanzvonkov Jul 19, 2022
caf7639
regenerate reports
ivanzvonkov Jul 19, 2022
da69a2d
Update datasets
ivanzvonkov Jul 19, 2022
0ee21e6
Remove duplicates
ivanzvonkov Jul 19, 2022
007e40d
use eo vs tifs
ivanzvonkov Jul 19, 2022
6b56cbd
Rename to eo
ivanzvonkov Jul 19, 2022
93cb1b2
continue eo renaming
ivanzvonkov Jul 19, 2022
f300473
Regenerate projects
ivanzvonkov Jul 19, 2022
1f6c0d3
Merge branch 'main' into clean-generate
ivanzvonkov Jul 19, 2022
8fc54a3
gee bug
ivanzvonkov Jul 19, 2022
98d16e8
raw labels bug
ivanzvonkov Jul 19, 2022
c11af7d
Test adding a new dataset
ivanzvonkov Jul 19, 2022
b55253d
Regenerate reports
ivanzvonkov Jul 19, 2022
856104c
Consistent eo_data prefix
ivanzvonkov Jul 19, 2022
732b4f4
Setup eo cols in a function
ivanzvonkov Jul 19, 2022
f47b574
Simpler notebook for adding data
ivanzvonkov Jul 19, 2022
709bdad
Update datasets
ivanzvonkov Jul 19, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 1 addition & 3 deletions .github/workflows/buildings-example-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,7 @@ jobs:
# https://dvc.org/doc/user-guide/setup-google-drive-remote#authorization
GDRIVE_CREDENTIALS_DATA: ${{ secrets.GDRIVE_CREDENTIALS_DATA }}
run: |
dvc pull $(openmapflow datapath PROCESSED_LABELS) -f
dvc pull $(openmapflow datapath COMPRESSED_FEATURES) -f
tar -xvzf $(openmapflow datapath COMPRESSED_FEATURES) -C data/
dvc pull $(openmapflow datapath DATASETS) -f
dvc pull $(openmapflow datapath MODELS) -f

- name: Integration test - Project
Expand Down
4 changes: 1 addition & 3 deletions .github/workflows/crop-mask-example-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,7 @@ jobs:
# https://dvc.org/doc/user-guide/setup-google-drive-remote#authorization
GDRIVE_CREDENTIALS_DATA: ${{ secrets.GDRIVE_CREDENTIALS_DATA }}
run: |
dvc pull $(openmapflow datapath PROCESSED_LABELS) -f
dvc pull $(openmapflow datapath COMPRESSED_FEATURES) -f
tar -xvzf $(openmapflow datapath COMPRESSED_FEATURES) -C data/
dvc pull $(openmapflow datapath DATASETS) -f
dvc pull $(openmapflow datapath MODELS) -f

- name: Integration test - Project
Expand Down
4 changes: 1 addition & 3 deletions .github/workflows/maize-example-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,7 @@ jobs:
# https://dvc.org/doc/user-guide/setup-google-drive-remote#authorization
GDRIVE_CREDENTIALS_DATA: ${{ secrets.GDRIVE_CREDENTIALS_DATA }}
run: |
dvc pull $(openmapflow datapath PROCESSED_LABELS) -f
dvc pull $(openmapflow datapath COMPRESSED_FEATURES) -f
tar -xvzf $(openmapflow datapath COMPRESSED_FEATURES) -C data/
dvc pull $(openmapflow datapath DATASETS) -f
dvc pull $(openmapflow datapath MODELS) -f

- name: Integration test - Project
Expand Down
26 changes: 12 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,19 +88,22 @@ After all configuration is set, the following project structure will be generate
└─── data
│ raw_labels/ # User added labels
│ processed_labels/ # Labels standardized to common format
│ features/ # Labels combined with satellite data
│ compressed_features.tar.gz # Allows faster features downloads
│ models/ # Models trained using features
│ datasets/ # ML ready datasets (labels + earth observation data)
│ models/ # Models trained using datasets
| raw_labels.dvc # Reference to a version of raw_labels/
| processed_labels.dvc # Reference to a version of processed_labels/
│ compressed_features.tar.gz.dvc # Reference to a version of features/
| datasets.dvc # Reference to a version of datasets/
│ models.dvc # Reference to a version of models/

```

This project contains all the code necessary for: Adding data ➞ Training a model ➞ Creating a map.

**Important:** When code is pushed to the repository a Github action will be run to verify project configuration, data integrity, and script functionality. This action will pull data using dvc and thereby needs access to remote storage (your Google Drive). To allow the Github action to access the data add a new repository secret ([instructions](https://docs.github.com/en/actions/security-guides/encrypted-secrets#creating-encrypted-secrets-for-a-repository)).
- In step 5 of the instructions, name the secret: `GDRIVE_CREDENTIALS_DATA`
- In step 6, enter the value in .dvc/tmp/gdrive-user-creditnals.json (in your repository)

After this the Github action should successfully run.


## Adding data [![cb]](https://colab.research.google.com/github/nasaharvest/openmapflow/blob/main/openmapflow/notebooks/new_data.ipynb)

Expand Down Expand Up @@ -134,25 +137,20 @@ datasets = [
...
]
```
Run feature creation:
Run dataset creation:
```bash
earthengine authenticate # For getting new earth observation data
gcloud auth login # For getting cached earth observation data

openmapflow create-features # Initiatiates or checks progress of features creation
openmapflow create-dataset # Initiatiates or checks progress of dataset creation
openmapflow datasets # Shows the status of datasets

dvc commit && dvc push # Push new data to data version control

git add .
git commit -m'Created new features'
git commit -m'Created new dataset'
git push
```
**Important:** When new data is pushed to the repository a Github action will be run to verify data integrity. This action will pull data using dvc and thereby needs access to remote storage (your Google Drive). To allow the Github action to access the data add a new repository secret ([instructions](https://docs.github.com/en/actions/security-guides/encrypted-secrets#creating-encrypted-secrets-for-a-repository)).
- In step 5 of the instructions, name the secret: `GDRIVE_CREDENTIALS_DATA`
- In step 6, enter the value in .dvc/tmp/gdrive-user-creditnals.json (in your repository)

After this the Github action should successfully run if the data is valid.


## Training a model [![cb]](https://colab.research.google.com/github/nasaharvest/openmapflow/blob/main/openmapflow/notebooks/train.ipynb)
Expand Down
4 changes: 1 addition & 3 deletions buildings-example/data/.gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
/datasets
/raw_labels
/processed_labels
/compressed_features.tar.gz
/models
/features
4 changes: 0 additions & 4 deletions buildings-example/data/compressed_features.tar.gz.dvc

This file was deleted.

5 changes: 5 additions & 0 deletions buildings-example/data/datasets.dvc
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
outs:
- md5: db853058c80b597bb44bfc0ecf37866f.dir
size: 121467360
nfiles: 2
path: datasets
4 changes: 0 additions & 4 deletions buildings-example/data/duplicates.txt

This file was deleted.

133 changes: 0 additions & 133 deletions buildings-example/data/missing.txt

This file was deleted.

5 changes: 0 additions & 5 deletions buildings-example/data/processed_labels.dvc

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,20 @@ DATASET REPORT (autogenerated, do not edit directly)

Uganda_buildings_2020 (Timesteps: 24)
----------------------------------------------------------------------------
eo_data_complete 8117
eo_data_duplicate 4
✔ training amount: 6445, positive class: 100.0%
✔ testing amount: 848, positive class: 100.0%
✔ validation amount: 824, positive class: 100.0%
✔ testing amount: 848, positive class: 100.0%



geowiki_landcover_2017 (Timesteps: 24)
----------------------------------------------------------------------------
eo_data_complete 13993
eo_data_export_failed 242
eo_data_missing_values 132
✔ training amount: 12582, positive class: 0.0%
✔ validation amount: 743, positive class: 0.0%
✔ testing amount: 668, positive class: 0.0%


All data:
✔ Found no empty features
✔ No duplicates found
Loading