-
Notifications
You must be signed in to change notification settings - Fork 329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SatlasPretrain: add new dataset #2248
base: main
Are you sure you want to change the base?
Conversation
|
||
Reference implementation: | ||
|
||
* https://github.com/allenai/satlas/blob/main/satlas/model/dataset.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it's worth adding this in the docstring or just leaving it in the comments only. Happy to add pointers to the official codebase/data loaders for Satlas which may be preferred by some users.
torchgeo/datasets/satlas.py
Outdated
'metadata': (), | ||
} | ||
|
||
# NOTE: 'tci' is RGB (b04-02), not BGR (b02-04) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could optionally add a bands
parameter that allows users to specify the order of spectral bands returned by the model, but so far I don't think we need this feature.
torchgeo/datasets/satlas.py
Outdated
channels.append(torch.tensor(np.array(img, dtype=np.float32))) | ||
return torch.cat(channels) | ||
|
||
def _load_label(self, label: str, col: int, row: int) -> Tensor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: add support for vector labels.
torchgeo/datasets/satlas.py
Outdated
sample: dict[str, Tensor] = {} | ||
|
||
for image in self.images: | ||
sample[f'image_{image}'] = self._load_image(image, col, row) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The design decision here is to use image_{landsat,naip,sentinel}
as the key so we can retain support for kornia.augmentation.AugmentationSequential
auto-detecting the type of key/value pairs. Not sure how important this actually is since many augmentations like Normalize will be unique to each image, but it could be useful for augmentations like RandomCrop.
I'm curious if anyone has ever managed to successfully download https://github.com/allenai/satlas/blob/main/satlaspretrain_urls.txt because I have been trying for weeks and the download always dies in the middle. |
Have you tried the aws-cli? |
How do I convert these URLs to s3 equivalents? |
You can also do: From working with Maxar Open Data on S3 and similar on Azure -- using |
+1 to Caleb's suggestion. Best to run in a screen / tmux - mine took a long time, and unzipping s2 took a day or so (on slower disks)!
|
5713bda
to
8df93ec
Compare
This comment was marked as outdated.
This comment was marked as outdated.
0103711
to
796e544
Compare
This PR adds a data loader for the SatlasPretrain dataset.
This is a work in progress:
References:
@favyen2 @piperwolters can you review this PR as time permits? I'm still in the process of downloading the entire dataset, so it's going to be a bit before I can actually test it myself, but wanted to open a WIP PR anyway. I have a ton of questions for you that I'll leave in-line and we can resolve as we finalize the PR. The first draft will likely have significantly limited functionality compared to your reference implementation, but that's fine for our use case. We can always expand it in the future.
@ando-shah and I are planning on heavily using your dataset for our next paper. Our specific use case requires us to only sample from tiles where all low-resolution products (S1, S2, L) are available. My current plan is to create a custom
metadata/train_lowres_matching.json
file containing the paired-down list of tiles. We can host this in our repo, or you're also welcome to include this in yourmetadata.tar.gz
file once it's complete.(yes, the original image is that washed out, the TCI product is not great)