Skip to content

Commit

Permalink
Merge pull request #3 from NASA-IMPACT/tiling-skeleton
Browse files Browse the repository at this point in the history
Tiling skeleton
  • Loading branch information
abarciauskas-bgse authored Aug 4, 2023
2 parents b12f611 + 1c14907 commit 0b7c806
Show file tree
Hide file tree
Showing 12 changed files with 69 additions and 17 deletions.
18 changes: 12 additions & 6 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,18 +23,24 @@ website:

style: "docked"
search: true
collapse-level: 3
collapse-level: 2
contents:
- href: index.qmd
text: Welcome
- section: approaches/index.qmd
contents:
- section: approaches/direct-client.md
- section: approaches/tiling.md
contents:
- section: approaches/tiling.qmd
contents:
- approaches/tiling/preprocessing.md
- approaches/tiling/benchmarks.md
- approaches/tiling/performance-methodology.md
- approaches/tiling/results-summary.md
- section: Results in Detail
contents:
- approaches/tiling/pgstac-cog.md
- approaches/tiling/zarr.md
- approaches/tiling/e2e.md
- approaches/tiling/recommendations.md
- approaches/tiling/future-areas.md
- section: approaches/direct-client.md



Expand Down
6 changes: 4 additions & 2 deletions approaches/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,7 @@ title: Approaches

For browser-based visualization of Zarr, there are 2 approaches covered in this cookbook:

1. [Direct Client](./direct-client.md)
2. [Tiling](./tiling.md)
1. [Tiling](./tiling.qmd)
2. [Direct Client](./direct-client.md)

The tile server provides an API which is interoperable with multiple interfaces, but requires maintaining a tile server. also the response delivered to the client is an image format, not the raw data itself. The direct client has access to the underlying data and thus maximum flexibility in rendering and analysis for the user.
5 changes: 4 additions & 1 deletion approaches/tiling.md → approaches/tiling.qmd
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Tiling
---
title: "Tiling"
---

The tiling approach is to provide image tiles via the [XYZ Protocol](https://en.wikipedia.org/wiki/Tiled_web_map) and [OGC API - Tiles](https://docs.ogc.org/is/20-057/20-057.html) API specifications.

This approach relies on the [`rio_tiler.XarrayReader`](https://github.com/cogeotiff/rio-tiler/blob/main/rio_tiler/io/xarray.py) library which includes the `tile` function. This function and others in that module are used to provide an API for tiles. An example API infrastructure can be found in [titiler-xarray](https://github.com/developmentseed/titiler-xarray). Please note this library is still in development and is not intended for production use at this time.

3 changes: 0 additions & 3 deletions approaches/tiling/benchmarks.md

This file was deleted.

3 changes: 3 additions & 0 deletions approaches/tiling/e2e.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# End-to-End Tests

From https://github.com/developmentseed/tile-benchmarking/tree/main/e2e/README.md
19 changes: 19 additions & 0 deletions approaches/tiling/performance-methodology.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Performance Methodology

## Datasets

Reproducibility is important to the integrity of this project and its reported results. That is why we selected a publicly available dataset and attempt to make the steps as fully documented and reproducible as possible. We used the [NASA Earth Exchange Global Daily Downscaled Projections (NEX-GDDP-CMIP6)](https://aws.amazon.com/marketplace/pp/prodview-k6adk576fiwmm#overview) AWS public dataset for this project.

There are 2 datasets listed on AWS: 1 is an archive of NetCDF files from about 35 different climate models, each supplying historical and predicted values for up to 9 environment variables, daily, from 1950 to 2100. To minimize preprocessing, test datasets were generated for the first 2 years of historical data, for 1 model and for 1 variable. These variables or dataset could easily be modified or swapped, but we expect the relative performance using different datasets to be the same.

In addition to the NetCDF data, there is an archives of COGs generated from that NetCDF data to support visualization via dynamic tiling using COGs. COGs are only available for 2 models, so for inter comparison of the tiling approach between COGs and Zarr, one of those models (`GISS-E2-1-G`) is used to generate Zarr stores.

At this time, a different model is used for the direct client approach (`ACCESS-CM2`), but we will demonstrate how there is no meaningful difference in the performance of tiling across these models.

Benchmarks for the tiling approach were run on the [VEDA JupyterHub](https://nasa-veda.2i2c.cloud/). The VEDA documentation details how to request access: https://nasa-impact.github.io/veda-docs/services/jupyterhub.html. Because we don't want to make the database or S3 bucket with datasets fully public, you must be logged into the VEDA JupyterHub to run those benchmarks.

## Performance Testing Methodology

### code profiling

### e2e testing
3 changes: 3 additions & 0 deletions approaches/tiling/pgstac-cog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# pgSTAC + COG results

From https://github.com/developmentseed/tile-benchmarking/blob/main/profiling/profile.ipynb
3 changes: 0 additions & 3 deletions approaches/tiling/preprocessing.md

This file was deleted.

3 changes: 3 additions & 0 deletions approaches/tiling/recommendations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Recommendations

TBD
3 changes: 3 additions & 0 deletions approaches/tiling/results-summary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Results Summary

From https://github.com/developmentseed/tile-benchmarking/blob/main/profiling/profile.ipynb and https://github.com/developmentseed/tile-benchmarking/blob/main/e2e/read-results.ipynb
3 changes: 3 additions & 0 deletions approaches/tiling/zarr.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Zarr Test Results

From https://github.com/developmentseed/tile-benchmarking/blob/main/profiling/profile.ipynb
17 changes: 15 additions & 2 deletions index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,21 @@ title: "Zarr Visualization Cookbook"
subtitle: "Methods and benchmarks for visualizing Zarr"
---

# Zarr Visualization Cookbook

This site documents different approaches and benchmarks for zarr visualization.

The intention is to support zarr data providers with some methods for visualizing zarr. This guide and report is intended to inform zarr data producers who want to understand the requirements for data pre-processing and chunking in order to support visualization through tiling server and direct client approahces.

# Background

Visualization of Earth science data is key to exploring and understanding Earth data. Web browsers offer a near-universal platform for exploring this data. However browsers of the web expect near instantaneous page rendering. The scale of geospatial data makes it challenging to serve this data “on-demand” as browsers cannot reproject and create image tiles fast enough for a good user experience.

The scale of geospatial data makes it challenging to serve this data “on-demand” as browsers cannot reproject and create image tiles fast enough for a good user experience. This challenge led to the development of pre-generated static map tiles.

While pregenerated map tiles make it possible to visualize data quickly, there are drawbacks. The most significant is the data provider chooses how the data will appear. Next generation approaches give that power to the user. Other drawbacks impact the data provider, such as storage costs and maintaining a pipeline to constantly update or reprocess the tile storage with new and updated data. But the user is impacted by having no power to adjust the visualization, such as modifying the color scale, color map or perform “band math” where multiple variables are combined to produce a new variable.

More recent years have seen the success of the dynamic tiling approach which allows for on-demand map tile creation. This approach has traditionally relied on reading data from Cloud-Optimized GeoTIFFs (COGs). When the Zarr data format gained popularity for large-scale n-dimensional data analysis, users started to opine for browser-based visualization. The conventional Zarr chunk size stored for analysis (~100mb) was acknowledged to be too large to be fetched by a browser.

Now there are 2 options: a dynamic tile server and a direct client. rio_tiler’s XarrayReader supports tile rendering from anything that is xarray-readable. This means a tile server can render tiles from Zarr stores as well as netCDF4/HDF5 and other formats. However, a tile server still requires running a server while the second option, a “direct client”, reads Zarr directly in the browser client and uses webGL to render map tiles.

This cookbook will describe these 2 approaches. We will discuss the tradeoffs, requirements for preprocessing the data and present performance testing results for when those preprocessing steps were taken or not. We hope that readers will be able to reuse lessons learned and recommendations to deliver their Zarr data to users in web browser and contribute to the wider adoption of this format for large scale environmental data understanding.

0 comments on commit 0b7c806

Please sign in to comment.