Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

earthaccess #181

Merged
merged 29 commits into from
Dec 4, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
6f57f72
earthdata farewell, earthaccess begins
betolink Nov 6, 2022
5e6f80a
lock python to 3.9
betolink Nov 6, 2022
bc84042
fix workflow
betolink Nov 6, 2022
cfa5fe4
fix tests
betolink Nov 6, 2022
53756b6
installing poetry in CI
betolink Nov 6, 2022
915cfee
fixing renaming typos
betolink Nov 7, 2022
c001c9f
Update README.md
betolink Nov 17, 2022
1ff5d38
Update README.md
betolink Nov 17, 2022
9a8bb99
API refactoring
betolink Nov 23, 2022
715ce7a
adding integration tests
betolink Nov 23, 2022
ac87df4
Update README.md
betolink Nov 29, 2022
527df54
Update README.md
betolink Nov 29, 2022
9d56de6
Update README.md
betolink Nov 29, 2022
5356eeb
Update README.md
betolink Nov 29, 2022
de0f677
Update README.md
betolink Nov 29, 2022
a2c368a
Update README.md
betolink Nov 29, 2022
b348d16
Update README.md
betolink Nov 29, 2022
b0554fb
new API notation, work in progress
betolink Nov 30, 2022
b3c23cf
Merge branch 'earthaccess' into earthaccess-dev
betolink Dec 2, 2022
7c365f7
update README, delete lock for poetry to avoid OS specific conflicts
betolink Dec 2, 2022
9ea15b7
update README, add EDL secrets to repo and consolidated README in docs
betolink Dec 2, 2022
1e9f250
fix test workflow
betolink Dec 2, 2022
8d9d43d
do not fail tests if we can't open a file
betolink Dec 3, 2022
62c7343
same for download, change error for warning
betolink Dec 3, 2022
3b76faf
update README
betolink Dec 4, 2022
fdaaff0
update header
betolink Dec 4, 2022
381e34d
update header
betolink Dec 4, 2022
dae4613
run tests only if we modify code
betolink Dec 4, 2022
5b99585
run tests only if we modify code on PRs
betolink Dec 4, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/documentation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ jobs:
python-version: 3.9
channels: conda-forge
mamba-version: "*"
activate-environment: earthdata-dev
environment-file: binder/environment.yml
activate-environment: earthaccess-dev
environment-file: binder/environment-dev.yml
- name: Get full python version
id: full-python-version
run: echo ::set-output name=version::$(python -c "import sys; print('-'.join(str(v) for v in sys.version_info))")
Expand Down
11 changes: 10 additions & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,21 @@ name: Test

on:
push:
paths:
- earthaccess/**
- tests/**
pull_request:
paths:
- earthaccess/**
- tests/**
types: [opened, synchronize]

jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.8, 3.9, '3.10']
python-version: [3.8, 3.9, '3.10', '3.11']
fail-fast: false

steps:
Expand Down Expand Up @@ -40,6 +46,9 @@ jobs:
- name: Install Dependencies
run: poetry install
- name: Test
env:
EDL_USERNAME: ${{ secrets.EDL_USERNAME }}
EDL_PASSWORD: ${{ secrets.EDL_PASSWORD }}
run: poetry run bash scripts/test.sh
- name: Upload coverage
uses: codecov/codecov-action@v1
5 changes: 5 additions & 0 deletions .vim/coc-settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"python.linting.pylintEnabled": true,
"python.linting.flake8Enabled": false,
"python.linting.enabled": true
}
14 changes: 9 additions & 5 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Changelog

## [UNRELEASED]

* name change

## [v0.4.1] 2022-11-02

* improved documentation:
Expand Down Expand Up @@ -51,8 +55,8 @@
- Add basic classes to interact with NASA CMR, EDL and cloud access.
- Basic object formatting.

[Unreleased]: https://github.com/betolink/earthdata/compare/v0.3.0...HEAD
[v0.3.0]: https://github.com/betolink/earthdata/releases/tag/v0.3.0
[v0.2.2]: https://github.com/betolink/earthdata/releases/tag/v0.2.2
[v0.2.1]: https://github.com/betolink/earthdata/releases/tag/v0.2.1
[v0.1.0-beta.1]: https://github.com/betolink/earthdata/releases/tag/v0.1.0-beta.1
[Unreleased]: https://github.com/betolink/earthaccess/compare/v0.3.0...HEAD
[v0.3.0]: https://github.com/betolink/earthaccess/releases/tag/v0.3.0
[v0.2.2]: https://github.com/betolink/earthaccess/releases/tag/v0.2.2
[v0.2.1]: https://github.com/betolink/earthaccess/releases/tag/v0.2.1
[v0.1.0-beta.1]: https://github.com/betolink/earthaccess/releases/tag/v0.1.0-beta.1
10 changes: 5 additions & 5 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,19 +9,19 @@ Please note we have a code of conduct, please follow it in all your interactions
## Development environment


`earthdata` is a Python library that uses Poetry to build and publish the package to PyPI, the defacto Python repository. In order to develop new features or patch bugs etc. we need to set up a virtual environment and install the library locally. We can accomplish this with both Poetry or/and Conda.
`earthaccess` is a Python library that uses Poetry to build and publish the package to PyPI, the defacto Python repository. In order to develop new features or patch bugs etc. we need to set up a virtual environment and install the library locally. We can accomplish this with both Poetry or/and Conda.

### Using Conda

If we have `miniconda` installed we can use the environment file included in the binder folder, this will install all the libraries we need (including Poetry) to start developing `earthdata`
If we have `mamba` (or conda) installed, we can use the environment file included in the binder folder, this will install all the libraries we need (including Poetry) to start developing `earthaccess`

```bash
>conda env update -f binder/environment.yml
>conda activate earthdata-dev
>mamba env update -f binder/environment-dev.yml
>mamba activate earthaccess-dev
>poetry install
```

After activating our environment and installing the library with Poetry we can run Jupyter lab and start testing the local distribution or we can use the scripts inside `scripts` to run the tests and linting.
After activating our environment and installing the library with Poetry we can run Jupyter lab and start testing the local distribution or we can use the scripts inside `scripts` to run the tests and linting.
Now we can create a feature branch and push those changes to our fork!


Expand Down
10 changes: 8 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,12 @@ python-three-nine: ## setup python3.9 virtual environment using poetry
poetry env use python3.9
poetry install

python-three-ten:
python-three-ten: ## setup python3.9 virtual environment using poetry
poetry env use python3.10
poetry install




pre-commit:
Expand Down Expand Up @@ -58,5 +64,5 @@ deploy-docs:

install: ## uninstall and install package with python
install:
poetry remove ./earthdata
poetry add ./earthdata
poetry remove ./earthaccess
poetry add ./earthaccess
162 changes: 107 additions & 55 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,118 +1,157 @@
# earthdata 🌍

<p align="center">
<em>Client library for NASA CMR and EDL APIs</em>
<img alt="earthaccess, a python library to search, download or stream NASA Earth science data with just a few lines of code" src="https://user-images.githubusercontent.com/717735/205517116-7a5d0f41-7acc-441e-94ba-2e541bfb7fc8.png" width="70%" align="center" />
</p>

<p align="center">
<a href="https://github.com/betolink/earthdata/actions?query=workflow%3ATest" target="_blank">
<img src="https://github.com/betolink/earthdata/workflows/Test/badge.svg" alt="Test">
</a>
<a href="https://github.com/betolink/earthdata/actions?query=workflow%3APublish" target="_blank">
<img src="https://github.com/betolink/earthdata/workflows/Publish/badge.svg" alt="Publish">

<a href="https://twitter.com/allison_horst" target="_blank">
<img src="https://img.shields.io/badge/Art%20By-Allison%20Horst-red" alt="Art Designer: Allison Horst">
</a>

<a href="https://pypi.org/project/earthdata" target="_blank">
<img src="https://img.shields.io/pypi/v/earthdata?color=%2334D058&label=pypi%20package" alt="Package version">
</a>

<a href="https://pypi.org/project/earthdata/" target="_blank">
<img src="https://img.shields.io/pypi/pyversions/earthdata.svg" alt="Python Versions">
</a>
<a href="https://github.com/psf/black" target="_blank">
<img src="https://img.shields.io/badge/code%20style-black-000000.svg" alt="Code style: black">
</a>

<a href="https://nsidc.github.io/earthdata/" target="_blank">
<img src="https://readthedocs.org/projects/earthdata/badge/?version=latest&style=plastic" alt="Documentation link">
</a>

</p>

## **Overview**

*earthaccess* is a **python library to search, download or stream NASA Earth science data** with just a few lines of code.

## Overview

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/betolink/earthdata/main)
In the age of cloud computing, the power of open science only reaches its full potential if we have easy-to-use workflows that facilitate research in an inclusive, efficient and reproducible way. Unfortunately —as it stands today— scientists and students alike face a steep learning curve adapting to systems that have grown too complex and end up spending more time on the technicalities of the tools, cloud and NASA APIs than focusing on their important science.

A Python library to search and access NASA datasets.
During several workshops organized by [NASA Openscapes](https://nasa-openscapes.github.io/events.html) the need to provide easy-to-use tools to our users became evident. Open science is a collaborative effort, it involves people from different technical backgrounds. Data analysis for the pressing problems we face cannot be limited by the complexity of the underlaying systems and thus providing easy access to NASA Earthdata is the main motivation behind this library.

## Installing earthdata
## **Installing earthaccess**

Install the latest release:
Install the latest release using conda

```bash
conda install -c conda-forge earthdata
conda install -c conda-forge earthaccess
```

Or you can clone `earthdata` and get started locally
Using Pip

```bash
pip install earthaccess
```

Try it in your browser without installing anything! [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/nsidc/earthdata/main)


## **Usage**

# ensure you have Poetry installed
pip install --user poetry

# install all dependencies (including dev)
poetry install
With *earthaccess* we can login, search and download data with a few lines of code and even more relevant, our code will work the same way if we are running it in the cloud or from our laptop. ***earthaccess*** handles authentication with [NASA's Earthdata Login (EDL)](https://urs.earthdata.nasa.gov), search using NASA's [CMR](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html) and access through [`fsspec`](https://github.com/fsspec/filesystem_spec).

# develop!
The only requirement to use this library is to open a free account with NASA [EDL](https://urs.earthdata.nasa.gov).



### **Authentication**

Once you have an EDL account, you can authenticate using one of the following three methods:

1. Using a `.netrc` file
* Can use *earthaccess* to read your EDL credentials (username and password) from a `.netrc` file
2. Reading your EDL credentials from environment variables
* if available you can use environment variables **EDL_USERNAME** and **EDL_PASSWORD**
3. Interactively entering your EDL credentials
* You can be prompted for these credentials and save them to a `.netrc` file

```python
import earthaccess

auth = earthaccess.login(strategy="netrc")
if not auth:
auth = earthaccess.login(strategy="interactive", persist=True)
```

## Example Usage
Once you are authenticated with NASA EDL you can:

* Get a file from a DAAC using a `fsspec` session.
* Request temporary S3 credentials from a particular DAAC (needed to download or stream data from an S3 bucket in the cloud).
* Use the library to download or stream data directly from S3.
* Regenerate CMR tokens (used for restricted datasets)


### **Searching for data**

Once we have selected our dataset we can search for the data granules using *doi*, *short_name* or *concept_id*.
If we are not sure or we don't know how to search for a particular dataset, we can start with the "searching for data" tutorial or through the [Earthdata search portal](https://search.earthdata.nasa.gov/). For a complete list of search parameters we can use visit the extended API documentation.

```python
from earthdata import Auth, DataGranules, DataCollections, Store

auth = Auth().login(strategy="netrc") # if we want to access NASA DATA in the cloud
results = earthaccess.search_data(
short_name='ATL06',
version="005",
cloud_hosted=True,
bounding_box=(-10, 20, 10, 50),
temporal=("2020-02", "2020-03"),
count=100
)

# To search for collecrtions (datasets)

DatasetQuery = DataCollections().keyword('MODIS').bounding_box(-26.85,62.65,-11.86,67.08)
```

Now that we have our results we can do multiple things, we can iterate over them to get HTTP (or S3) links; we can download the files to a local folder or we can open these files and stream their content directly to other libraries e.g. xarray.

counts = DatasetQuery.hits()
collections = DatasetQuery.get()
### **Accessing the data**

**Option 1: Using the data links**

# To search for granules (data files)
GranuleQuery = DataGranules().concept_id('C1711961296-LPCLOUD').bounding_box(-10,20,10,50)
If we already have a workflow in place for downloading our data, we can use *earthaccess* as a search-only library and get HTTP links from our query results. This could be the case if our current workflow uses a different language and we only need the links as input.

# number of granules (data files) that matched our criteria
counts = GranuleQuery.hits()
# We get the metadata
granules = GranuleQuery.get(10)
```python

# earthdata provides some convenience functions for each data granule
data_links = [granule.data_links(access="direct") for granule in granules]
# if the data set is cloud hosted there will be S3 links available. The access parameter accepts "direct" or "external", direct access is only possible if you are in the us-west-2 region in the cloud.
data_links = [granule.data_links(access="direct") for granule in results]

# or if the data is an on-prem dataset
data_links = [granule.data_links(access="external") for granule in results]

data_links = [granule.data_links(access="onprem") for granule in granules]
```

# The Store class allows to get the granules from on-prem locations with get()
# NOTE: Some datasets require users to accept a Licence Agreement before accessing them
store = Store(auth)
> Note: *earthaccess* can get S3 credentials for us, or auhenticated HTTP sessions in case we want to use them with a different library.

# This works with both, on-prem or cloud based collections**
store.get(granules, local_path='./data')
**Option 2: Download data to a local folder**

# if you're in a AWS instance (us-west-2) you can use open() to get a fileset of S3 files!
fileset = store.open(granules)
This option is practical if you have the necessary space available on disk, the *earthaccess* library will print out the approximate size of the download and its progress.
```python
files = earthaccess.download(results, "./local_folder")

# Given that this is gridded data (Level 3 or up) we could
xarray.open_mfdataset(fileset, combine='by_coords')
```

For more examples see the `Demo` and `EarthdataSearch` notebooks.
**Option 3: Direct S3 Access - Stream data directly to xarray**

This method works best if you are in the same region as the data (us-west-2) and you are working with gridded datasets (processing level 3 and above).

Only **Python 3.8+** is supported.
```python
import xarray as xr

ds = xr.open_mfdataset(earthaccess.open(results, auth=auth), engine="scipy")

## Code of Conduct
```

See [Code of Conduct](CODE_OF_CONDUCT.md)
And that's it! Just one line of code, and this same piece of code will also work for data that are not hosted in the cloud, i.e. NASA hosted data sets.

## Level of Support

* This repository is not actively supported by NSIDC but we welcome issue submissions and pull requests in order to foster community contribution.
> More examples coming soon!


### Compatibility

Only **Python 3.8+** is supported.

<img src="docs/nsidc-logo.png" width="84px" />



Expand All @@ -125,3 +164,16 @@ See [Code of Conduct](CODE_OF_CONDUCT.md)
Welcome! 😊👋

> Please see the [Contributing Guide](CONTRIBUTING.md).

### [Project Board](https://github.com/nsidc/earthdata/discussions).

### Glossary

<a href="https://www.earthdata.nasa.gov/learn/glossary"><img src="https://auth.ops.maap-project.org/cas/images/urs-logo.png" /></a>

## Level of Support

* This repository is not actively supported by NSIDC but we welcome issue submissions and pull requests in order to foster community contribution.

<img src="https://raw.githubusercontent.com/nsidc/earthdata/main/docs/nsidc-logo.png" width="84px" />

16 changes: 16 additions & 0 deletions binder/environment-dev.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
name: earthaccess-dev
channels:
- conda-forge
dependencies:
# This environment bootstraps poetry, the actual dev environment
# is installed and managed with poetry
- python=3.9
- jupyterlab=3
- xarray>=0.19
- matplotlib-base>=3.3
- cartopy>=0.18.0
- ipyleaflet>=0.13
- h5netcdf>=0.11
- pip
- pip:
- poetry
Loading