Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NLCD2016 Tree Canopy #1243

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

isaaccorley
Copy link
Collaborator

This PR adds the NLCD2016 Tree Canopy dataset

See https://www.mrlc.gov/data/nlcd-2016-usfs-tree-canopy-cover-conus

@isaaccorley isaaccorley marked this pull request as draft April 14, 2023 02:56
@isaaccorley isaaccorley self-assigned this Apr 14, 2023
@github-actions github-actions bot added the datasets Geospatial or benchmark datasets label Apr 14, 2023
class NLCD2016TreeCanopy(RasterDataset):
"""National Land Cover Database 2016 (NLCD2016) - Tree Canopy dataset.

The `National Land Cover Database <https://www.mrlc.gov/>`_ provides 30m tree
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would link to the tree canopy page, not the NLCD page

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `National Land Cover Database <https://www.mrlc.gov/>`_ provides 30m tree
The `Multi-Resolution Land Characteristics (MRLC) Consortium <https://www.mrlc.gov/>`_ provides 30m tree

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamjstewart adamjstewart added this to the 0.5.0 milestone Apr 14, 2023
calebrob6
calebrob6 previously approved these changes Apr 14, 2023
class NLCD2016TreeCanopy(RasterDataset):
"""National Land Cover Database 2016 (NLCD2016) - Tree Canopy dataset.

The `National Land Cover Database <https://www.mrlc.gov/>`_ provides 30m tree
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@calebrob6
Copy link
Member

Comparison of the IMG format to COG format:

  • IMG is uncompressed, just the tree canopy dataset is 20GB on disk. It is 16832104560 pixels that are 1 byte each + overviews :)
  • IMG format consists of a .html, .ige, .img, and .img.xml file
  • COG is 4.2 GB total, no extra files, lossless compression
  • Random windowed reads on the IMG data is ~.6 seconds per 1000
  • Random windowed reads on the COG data is 2.1 seconds per 1000

image

@adamjstewart
Copy link
Collaborator

Surprised img is faster than COGs, I thought COGs were the gold standard.

@adamjstewart
Copy link
Collaborator

This will need to be rebased once #1244 is merged.

@calebrob6
Copy link
Member

Surprised img is faster than COGs, I thought COGs were the gold standard.

I'm guessing the difference is compression related (COG is 5x smaller and 3x slower to read). It is apples to oranges as if these were hosted on a remote server, you could still do windowed reading quickly with a COG.

@calebrob6
Copy link
Member

Confirmed that the difference is entirely compression related:

image

@calebrob6
Copy link
Member

calebrob6 commented Apr 14, 2023

(for completeness, because I was curious)

It is apples to oranges as if these were hosted on a remote server, you could still do windowed reading quickly with a COG.

Not quite actually, you can still do windowed reading from remote files with the Erdas Imagine format, but it is 2x slower than COGs. Also, compression vs. no compression doesn't seem to matter when reading from remote files (it looks like compressed is slightly faster, which makes sense as the time it takes to transfer the data is going to dominate).

image

@calebrob6
Copy link
Member

TL;DR -- use COGs

@adamjstewart adamjstewart removed this from the 0.5.0 milestone Sep 28, 2023
@adamjstewart
Copy link
Collaborator

We now have a generic NLCD dataset, this is likely something we should add to nlcd.py instead of making it its own unrelated dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants