Add Digital typhoon dataset #1748

nilsleh · 2023-11-30T21:47:51Z

This PR adds the Digital Typhoon Dataset.

The implementation allows the following features:

create an input sequence of single channel images concatenated along channel dimension for nowcasting task (predicting label of last image in that sequence)
filter samples by min or max feature values
datamodule that lets you split by storm id (disjoint sets over the time domain) or over the time domain (disjoint set of storm ids)

TODO:

Target Normalization for regression task

Sample Image:

calebrob6 · 2023-12-02T06:04:12Z

This is really cool! I wonder if there is any generalization between this and the Cyclone dataset

nilsleh · 2023-12-02T07:58:01Z

This is really cool! I wonder if there is any generalization between this and the Cyclone dataset

Stay tuned:)

torchgeo/datasets/digital_typhoon.py

docs/api/non_geo_datasets.csv

tests/data/digital_typhoon/data.py

tests/datamodules/test_digital_typhoon.py

tests/datasets/test_digital_typhoon.py

torchgeo/datasets/digital_typhoon.py

adamjstewart · 2024-08-21T09:22:04Z

torchgeo/datasets/digital_typhoon.py

+            )
+        )
+
+        # torchgeo expects a single label


We can fix this if needed. I'm trying to add support for multi-label classification in #2219

adamjstewart · 2024-08-21T09:23:25Z

torchgeo/datasets/digital_typhoon.py

+                # tensor with added channel dimension
+                tensor = torch.from_numpy(h5f['Infrared'][:]).unsqueeze(0)
+
+                # follow normalization procedure


Wondering if this should be in the datamodule instead. Possibly also true for other stuff.

I think it has to be part of the dataset because the normalization values can change. In the datamodule it will become quiet messy, and overall I think it's much nicer if normalization happens in the dataset and any augmentations in the datamodule.

Only complaint is that the user can't perform their own normalization, or if they do it is now duplicated. Definitely something worth discussing more broadly though (not just in your PR, for all datasets).

adamjstewart · 2024-08-21T09:24:23Z

torchgeo/datasets/digital_typhoon.py

+            name: torch.tensor(feature_df[name].item()).float()
+            for name in self.features
+        }
+        # normalize the targets for regression


torchgeo/datasets/digital_typhoon.py

adamjstewart

Looks pretty good now. Only remaining comments worth addressing before we merge are:

list -> tuple
Dataset name

2 in particular is important to avoid API changes in the future. All other comments can be changed later.

adamjstewart · 2024-08-28T11:04:56Z

Inspired by this rename I ~~wasted~~ invested 2 hrs of my life writing https://github.com/adamjstewart/dotfiles/blob/master/bin/git-sed

adamjstewart

Just need to get minimum tests passing now.

nilsleh added 4 commits November 13, 2023 10:01

analysis task dataset

c82a5e7

merge main

1da1fa1

implement sequence sampling

c31f73a

add outline datamodule

a4bde5e

nilsleh marked this pull request as draft November 30, 2023 21:48

github-actions bot added datasets Geospatial or benchmark datasets datamodules PyTorch Lightning datamodules labels Nov 30, 2023

adamjstewart added this to the 0.6.0 milestone Nov 30, 2023

nilsleh added 3 commits December 1, 2023 15:42

add datamodule with two way splitting capabilities

e2a37a5

add plotting function

a2881af

download and verify

49254ad

github-actions bot added the testing Continuous integration testing label Dec 1, 2023

add unit tests but they fail

bcaaed9

fix tests

6272191

github-actions bot added the documentation Improvements or additions to documentation label Dec 2, 2023

nilsleh added 6 commits December 2, 2023 12:28

fix style

4477cf6

trainer testing yaml

0028a22

test split logic

a939656

fix tests

ba94b79

fix tests2

6963138

found bug

aeec4dd

nilsleh marked this pull request as ready for review December 2, 2023 14:18

nilsleh added 4 commits December 2, 2023 14:24

merge main

171fed8

try to fix mypy

82c5bed

h5py error docs

407d50f

fix docs

48cb869

nilsleh commented Dec 7, 2023

View reviewed changes

torchgeo/datasets/digital_typhoon.py Outdated Show resolved Hide resolved

merge main

f6959a4

nilsleh added 6 commits August 20, 2024 07:58

docs

9a91c4e

lazy import

82af758

h5py

35668fb

h5py datamodule

c6850cf

typo

b9e0db0

tests

5fb6074

nilsleh added this to the 0.6.0 milestone Aug 20, 2024

adamjstewart requested changes Aug 21, 2024

View reviewed changes

nilsleh added 3 commits August 21, 2024 10:52

review

a5afd95

pass tests

bb35ecf

fix tests

036f526

adamjstewart reviewed Aug 27, 2024

View reviewed changes

torchgeo/datasets/digital_typhoon.py Outdated Show resolved Hide resolved

adamjstewart reviewed Aug 27, 2024

View reviewed changes

adamjstewart and others added 5 commits August 27, 2024 16:43

list -> tuple

751c475

mypy fix

9e2ca7a

rename

0013418

Merge branch 'main' into digital_typhoon

d4c16a9

tests

836681e

Remove Analysis

87ebb44

adamjstewart previously approved these changes Aug 28, 2024

View reviewed changes

nilsleh added 2 commits August 28, 2024 17:08

min pandas 2.2.0

2b40528

pull

b555e09

nilsleh dismissed adamjstewart’s stale review via b555e09 August 28, 2024 17:11

github-actions bot added the dependencies Packaging and dependencies label Aug 28, 2024

resolve tests

37a74a8

github-actions bot removed the dependencies Packaging and dependencies label Aug 29, 2024

Merge branch 'main' into digital_typhoon

5512500

adamjstewart approved these changes Aug 29, 2024

View reviewed changes

adamjstewart merged commit b9a09f5 into microsoft:main Aug 29, 2024
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Digital typhoon dataset #1748

Add Digital typhoon dataset #1748

nilsleh commented Nov 30, 2023 •

edited

Loading

calebrob6 commented Dec 2, 2023

nilsleh commented Dec 2, 2023

adamjstewart Aug 21, 2024

adamjstewart Aug 21, 2024

nilsleh Aug 21, 2024

adamjstewart Aug 21, 2024

adamjstewart Aug 21, 2024

adamjstewart left a comment

adamjstewart commented Aug 28, 2024 •

edited

Loading

adamjstewart left a comment

Add Digital typhoon dataset #1748

Add Digital typhoon dataset #1748

Conversation

nilsleh commented Nov 30, 2023 • edited Loading

calebrob6 commented Dec 2, 2023

nilsleh commented Dec 2, 2023

adamjstewart Aug 21, 2024

Choose a reason for hiding this comment

adamjstewart Aug 21, 2024

Choose a reason for hiding this comment

nilsleh Aug 21, 2024

Choose a reason for hiding this comment

adamjstewart Aug 21, 2024

Choose a reason for hiding this comment

adamjstewart Aug 21, 2024

Choose a reason for hiding this comment

adamjstewart left a comment

Choose a reason for hiding this comment

adamjstewart commented Aug 28, 2024 • edited Loading

adamjstewart left a comment

Choose a reason for hiding this comment

nilsleh commented Nov 30, 2023 •

edited

Loading

adamjstewart commented Aug 28, 2024 •

edited

Loading