Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Datamodule #66

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

CorentinSeznec
Copy link
Contributor

  • Add datamodule as a wrapper of dataloaders.
  • Add merge_dicts to Smeagol and Poesy Dataset

README.md Outdated Show resolved Hide resolved
Copy link
Contributor

@LBerth LBerth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job !
I just wonder if we should have one big datamodule that manages all datasets, or one datamodule per dataset ? I think that if we want to use lightning CLI, it uses the second options. @A669015 what do you think ?

py4cast/lightning.py Outdated Show resolved Hide resolved
py4cast/lightning.py Outdated Show resolved Hide resolved
@A669015
Copy link
Contributor

A669015 commented Oct 8, 2024

Good job ! I just wonder if we should have one big datamodule that manages all datasets, or one datamodule per dataset ? I think that if we want to use lightning CLI, it uses the second options. @A669015 what do you think ?

Yes that is the philosophy behind the Datamodule to be associated to a Dataset. I guess a "big" generic Datamodule could be used if the datasets are transparently replaceable each other (finally that is the philosophy of py4cast) .
Is there for some datasets the requirement of dataset-specific parameters, that could then make less nice the usage of the lightning CLI with a generic datamodule ?

@colon3ltocard
Copy link
Contributor

colon3ltocard commented Oct 8, 2024

Good job ! I just wonder if we should have one big datamodule that manages all datasets, or one datamodule per dataset ? I think that if we want to use lightning CLI, it uses the second options. @A669015 what do you think ?

The dataset name being a str arg to the datamodule I think even as it is the CLI would introspect and be able to pass the argument from the cli to the datamodule and thus dataset selection would work.

https://lightning.ai/docs/pytorch/stable/cli/lightning_cli_advanced_3.html#multiple-models-and-or-datasets

data:
  class_path: py4cast.lightning.plDataModule
  init_args:
    dataset: titan
...

But it is probably more flexible to have one DataModule subclass per dataset.

bin/inference.py Outdated Show resolved Hide resolved
bin/inference.py Outdated Show resolved Hide resolved
bin/inference.py Show resolved Hide resolved
Comment on lines +72 to +82
@property
def get_len_train_dl(self):
return len(self.train_ds.torch_dataloader(self.dl_settings))

@property
def get_train_dataset_info(self):
return self.train_ds.dataset_info

@property
def get_infer_ds(self):
return self.test_ds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should remove the get prefix, it is not necessary for properties and very "java like"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants