Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Access Issues #33

Open
mburges-cvl opened this issue Jan 30, 2024 · 3 comments
Open

Data Access Issues #33

mburges-cvl opened this issue Jan 30, 2024 · 3 comments
Assignees

Comments

@mburges-cvl
Copy link

Hello,

I seem to have some issues with the data access. Maybe you could clarify them for me.

I am able to complete this part:

curl -O https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-cli-387.0.0-linux-x86_64.tar.gz
tar -xf google-cloud-cli-387.0.0-linux-x86_64.tar.gz
exec bash
./google-cloud-sdk/install.sh
gcloud init
earthengine authenticate

however, when I try any of these:

gcloud storage mb -l us-central1 $(python -c "from dataops import EE_BUCKET; print(EE_BUCKET)")
gcloud storage mb -l us-central1 $(python -c "from dataops import NPY_BUCKET; print(NPY_BUCKET)")
gcloud storage mb -l us-central1 $(python -c "from dataops import TAR_BUCKET; print(TAR_BUCKET)")

I get the following error:

~/presto$ gcloud storage mb -l us-central1 $(python -c "from dataops import EE_BUCKET; print(EE_BUCKET)")
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'dataops'
ERROR: (gcloud) Invalid choice: 'storage'.
Maybe you meant:
  gcloud alpha storage
  gcloud composer environments

To search the help text of gcloud commands, run:
  gcloud help -- SEARCH_TERMS

What exactly am I doing wrong?

Thanks!

@gabrieltseng gabrieltseng self-assigned this Feb 22, 2024
@gabrieltseng
Copy link
Collaborator

Hi ! Apologies for the delay; I somehow missed this issue.

ModuleNotFoundError: No module named 'dataops'

This is a typo in the README - the import path is now presto.dataops

ERROR: (gcloud) Invalid choice: 'storage'.

I suspect this error has to do with the update to the gcloud storage CLI from gsutil. alpha seems to no longer be required - which version of gcloud are you using? But adding alpha seems like it might work, based on the error message. Based on this, the updated command would be:

gcloud storage buckets create gs://$(python -c "from presto.dataops import EE_BUCKET; print(EE_BUCKET)") --location=us-central1 
gcloud storage buckets create gs://$(python -c "from presto.dataops import EE_BUCKET; print(NPY_BUCKET)") --location=us-central1 
gcloud storage buckets create gs://$(python -c "from presto.dataops import EE_BUCKET; print(TAR_BUCKET)") --location=us-central1 

or, with the alpha:

gcloud alpha storage buckets create gs://$(python -c "from presto.dataops import EE_BUCKET; print(EE_BUCKET)") --location=us-central1 
gcloud alpha storage buckets create gs://$(python -c "from presto.dataops import EE_BUCKET; print(NPY_BUCKET)") --location=us-central1 
gcloud alpha storage buckets create gs://$(python -c "from presto.dataops import EE_BUCKET; print(TAR_BUCKET)") --location=us-central1 

But I haven't tested this yet. Once I do I will update the README.

However all this step does it make the buckets; another option is to just manually make them in the google cloud console - their names are :

I hope this helps, and apologies again for the delay.

@mburges-cvl
Copy link
Author

Thanks for the answer, but I have another question, what exactly are contained in:

presto/data/dynamic_world_samples_active_shards.geojson

and

presto/data/dynamic_world_samples.geojson

?

Are these the individual files for training (once all active shards, that can be downloaded and once all shards from the original dw dataset)? So could I hypothetically just download these individual files?

Thanks!

@gabrieltseng
Copy link
Collaborator

presto/data/dynamic_world_samples.geojson provides all the locations used in the exports.

Yes - you can use this file to just download all the data from earthengine.

presto/data/dynamic_world_samples_active_shards.geojson describes which files we have actually exported and used for training (so can be deleted if you are doing a fresh re-export). A new file with your active shards would be written during the export.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants