Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: move & improve STAC notebook #71

Merged
merged 1 commit into from
Nov 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,11 @@ parts:
- file: content/02/05_00_S1_SurfMI
- file: content/02/06_00_S1_Coherence
- file: content/02/07_00_Copernicus_DEM
- file: content/02/08_00_STAC_data
- caption: How to...
chapters:
- file: content/03/01_00_Override_Params
- file: content/03/02_00_Dask_Dashboard
- file: content/03/03_00_Clip_to_vec
- file: content/03/04_00_Spyndex
- file: content/03/05_00_Count_valid
- file: content/03/06_00_STAC_data
Original file line number Diff line number Diff line change
Expand Up @@ -4,30 +4,32 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# ...load data from remote STAC Catalogs?"
"# Remote STAC Catalogs"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to load data products from remote [SpatioTemporal Asset Catalogs (STAC)](https://stacspec.org/en/), we can make use of the `load_from_stac` function provided by the `sdc-tools` package. Currently, this function supports loading data products hosted by [Microsoft Planetary Computer (MPC)](https://planetarycomputer.microsoft.com/catalog) and [Digital Earth Africa (DEA)](https://explorer.digitalearth.africa/).\n"
"In order to load data products from remote [SpatioTemporal Asset Catalogs (STAC)](https://stacspec.org/en/), we can make use of the `load_from_stac`-function provided by the `sdc-tools` package. Currently, this function supports loading data products hosted by [Microsoft Planetary Computer (MPC)](https://planetarycomputer.microsoft.com/catalog) and [Digital Earth Africa (DEA)](https://explorer.digitalearth.africa/).\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```{warning}\n",
"Please be aware that working with remote data products might be quite inefficient. This is especially true, if the data is loaded with inappropriatly chosen parameters. Before loading a data product, you should get to know its basic characteristics. If you know the answer to at least the following questions, you are good to go:\n",
"Up until now, we have worked with data products that are hosted on our local file servers. The loading of these is optimized by the `sdc-tools` package. In case of remote data products, **you are responsible** for choosing the right parameters for loading the data. An inappropriate choice can potentially lead to inefficient loading times and high memory usage, so please be aware of this. Create an issue or contact me directly if you have any questions.\n",
"\n",
"Before loading a remote data product, you should get to know some of its basic characteristics. If you know the answer to at least the following questions, you are good to go:\n",
"- **What is the pixel spacing / resolution of the data?** \n",
" - Override the default `resolution` parameter if necessary.\n",
"- **Is the data categorical or continuous?** E.g., land cover is categorical, while spectral bands are continuous.\n",
" - If the data is categorical you should override the default `resampling` method to `'nearest'`.\n",
"- **In which datatype is the data stored and are there differences between the bands?** Common types are `uint8`, `uint16` and `float32`, for example. \n",
" - If there are differences in datatype between the bands you're interested in, it's probably best to load these separately by specifiying the `bands` parameter and using the appropriate `dtype` for each band.\n",
"- **Is the data categorical/discrete or continuous?** E.g., land cover is categorical, while spectral bands are continuous.\n",
" - If the data is categorical you should override the default `resampling` method to `'nearest'`. [Here](https://gisgeography.com/raster-resampling) you can find a short summary of a few common resampling methods.\n",
"- **In which data type is the product stored and are there differences between the bands?** Common types are `uint8`, `uint16` and `float32`, for example. \n",
" - If there are differences in data types between the bands you're interested in, it's probably best to load these separately by specifiying the `bands` parameter and using the appropriate `dtype` for each band.\n",
"\n",
"You should get an idea of how to handle these cases by having a look at the examples below. If something is unclear, please let me know!\n",
"You should get an idea of how to handle these cases by having a look at the examples below. I also recommend you to read the guide on how to {ref}`override-defaults` if you haven't already.\n",
"```"
]
},
Expand All @@ -36,7 +38,9 @@
"metadata": {},
"source": [
"```{note}\n",
"In both examples we will use the bounding box of an entire SALDi site as an example. If you have a specific area of interest, you can replace the bounding box with your own. E.g., by using the utility function `sdc.vec.get_vec_bounds`. In general it is recommended to try things out on a small subset first, before scaling up to larger areas and time periods.\n",
"In both examples we will use the bounding box of an entire SALDi site as an example. If you have a specific area of interest, you can replace the bounding box with your own. E.g., by using the utility function `sdc.vec.get_vec_bounds` to generate a bounding box from a vector file.\n",
"\n",
"In general it is recommended to test on a small subset first, before scaling up to larger areas and time periods.\n",
"```"
]
},
Expand Down Expand Up @@ -618,10 +622,12 @@
"\n",
"bounds = get_site_bounds(site=\"site06\")\n",
"time_range = (\"2018\", \"2023\")\n",
"override_defaults = {'crs': 'EPSG:4326', # this is already the default, but just to be explicit I wanted to show it here\n",
" 'resolution': 0.005, # equal to approx. 500 m pixel spacing, similar to the original data\n",
" 'resampling': 'nearest', # the data is categorical, so `nearest` resampling is appropriate!\n",
" 'chunks': {'time': 'auto', 'y': 'auto', 'x': 'auto'}} # if you're not sure, you can set all to 'auto'\n",
"override_defaults = {\n",
" 'crs': 'EPSG:4326', # this is already the default, but just to be explicit I wanted to show it here\n",
" 'resolution': 0.005, # equal to approx. 500 m pixel spacing, similar to the original data\n",
" 'resampling': 'nearest', # the data is categorical, so `nearest` resampling is appropriate!\n",
" 'chunks': {'time': 'auto', 'y': 'auto', 'x': 'auto'} # if you're not sure, you can set all to 'auto'\n",
" } \n",
"\n",
"modis_burned = load_from_stac(\n",
" stac_endpoint='pc',\n",
Expand Down Expand Up @@ -1775,10 +1781,12 @@
"\n",
"bounds = get_site_bounds(site=\"site04\")\n",
"time_range = (\"2019\", \"2019\") # only a mask for the year 2019 is available\n",
"override_defaults = {'crs': 'EPSG:4326', # this is already the default, but just to be explicit I wanted to show it here\n",
" 'resolution': 0.0001, # equal to approx. 10 m pixel spacing, similar to the original data\n",
" 'resampling': 'nearest', # the data is categorical, so `nearest` resampling is appropriate!\n",
" 'chunks': {'time': -1, 'y': -1, 'x': -1}} # it's a single time slice and the dtype is uint8 (\"smaller\" data), so we can load it all into one chunk\n",
"override_defaults = {\n",
" 'crs': 'EPSG:4326', # this is already the default, but just to be explicit I wanted to show it here\n",
" 'resolution': 0.0001, # equal to approx. 10 m pixel spacing, similar to the original data\n",
" 'resampling': 'nearest', # the data is categorical, so `nearest` resampling is appropriate!\n",
" 'chunks': {'time': -1, 'y': -1, 'x': -1} # single time slice and the small dtype (uint8), so loading it into one chunk should be fine\n",
" } \n",
"\n",
"crop_2019 = load_from_stac(\n",
" stac_endpoint='deafrica',\n",
Expand Down Expand Up @@ -2320,13 +2328,6 @@
"source": [
"crop_2019.mask.plot()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand Down
1 change: 1 addition & 0 deletions docs/content/03/01_00_Override_Params.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@

(override-defaults)=
# ...use other loading parameters with `load_product`?

```{warning}
Expand Down