Making train_params and model_params functional to add documentation and avoid duplication #38

gokceneraslan · 2024-08-05T01:32:42Z

Right now it's hard to document model_params and train_params because they are simply big dictionaries we copy from notebook to notebook, e.g.:

model_params = {
    'model_type':'BorzoiPretrainedModel', # Type of model
    'n_tasks': 1, # Number of cell types to predict
    'crop_len':0, # No cropping of the model output
    'n_transformers': 0, # Number of transformer layers; the published Enformer model has 11
}

train_params = {
    'task':'binary', # binary classification
    'lr':1e-4, # learning rate
    'logger': 'csv', # Logs will be written to a CSV file
    'batch_size': 1024,
    'num_workers': 8,
    'devices': 1, # GPU index
    'save_dir': experiment,
    'optimizer': 'adam',
    'max_epochs': 5,
    'checkpoint': True, # Save checkpoints
}

import grelu.lightning
model = grelu.lightning.LightningModel(model_params=model_params, train_params=train_params)

This also makes things highly redundant. I was wondering if we can simply move these to very simple functions like make_train_params() and make_model_params() where the dictionaries above are the default arguments and they simply return these dictionaries. They can be overwritten by the user and should have docstrings just like other functions. That'd solve both the lack of documentation problem and the redundancy problem.

Final thing would look like:

model = grelu.lightning.LightningModel(
    model_params=make_model_params(model_type='BorzoiPretrainedModel'),
    train_params=make_train_params(lr=1e-5, devices=2),
)

Let me know if this makes sense.

The text was updated successfully, but these errors were encountered:

gokceneraslan · 2024-08-05T01:40:15Z

Probably better to have separate model_params functions per model_type because they will have different types of params e.g. DilatedConvModel won't have n_transformers.

ekageyama · 2024-08-05T14:07:08Z

I see the reasoning for this, but is there a reason why you want to have them in a function? Wouldn't it be more simple to have a json/yaml file in a directory with models and load them directly from there and overwrite the params of the model? If you make a function then it becomes gRelu specific instead of a more generic solution.

gokceneraslan · 2024-08-05T17:07:47Z

As long as it is well-documented, defaults are easy to access in a notebook and can be overwritten easily I think it doesn't matter much where they live, json might be OK too. Can you write how grelu.lightning.LightningModel would look like with the json solution and how defaults would be overwritten?
I don't think we will have tens or hundreds of types of different models, and we won't support a non-gReLU model anyway. So no need to overthink/overcomplicate/overstandardize this, IMO.
Basically the user should be able to start a new notebook and train e.g. a simple dilatedconv model without looking at the models.py or copy/pasting the big dictionary from the tutorial file while having t he flexibility to overwrite things and do simple hyperparameter searches. This is the goal. User also shouldn't know if there is JSON file somewhere or look for it in e.g. grelu.resources or so.

ekageyama · 2024-08-05T17:34:45Z

How i would do it is I would create a pydantic model for your models, this would validate your model params. If you want a function that gives said params, the function reads from a default file and spits the new params, you could add functionality that replaces some elements if you give them key value pairs, and runs the validation after to ensure the new params are correct. The nice thing of this approach is that the pydantic models can be exported and you guarantee that the params are valid.

dagarfield · 2024-08-05T19:08:43Z

My two cents for what it’s worth. One very nice feature of gReLU is that is self contained and easy to use. I worry it becomes more steps for users to generate errors if you move, eg, to JSON files. Sent from Gmail Mobile

…

On Mon 5. Aug 2024 at 10:35, ekageyama ***@***.***> wrote: How i would do it is I would create a pydantic model for your models, this would validate your model params. If you want a function that gives said params, the function reads from a default file and spits the new params, you could add functionality that replaces some elements if you give them key value pairs, and runs the validation after to ensure the new params are correct. The nice thing of this approach is that the pydantic models can be exported and you guarantee that the params are valid. — Reply to this email directly, view it on GitHub <#38 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACQMKU3J6IVWNMZ5EBWRMH3ZP6ZUZAVCNFSM6AAAAABL7KZGB6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRZGU3TEMRYHA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

ekageyama · 2024-08-05T20:29:37Z

In theory a new user wouldnt touch the jsons, as gocken mentioned it would be something like get_mode_param("DilatedConvModel") , and this would read and validate the json, and give you your param dictionary. But if you give someone else a model, they can ingest it and validate it. This will enforce that models are complete and the values correct. I think that a validation error is better than a pytorch one

avantikalal added the enhancement New feature or request label Oct 14, 2024

avantikalal self-assigned this Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Making train_params and model_params functional to add documentation and avoid duplication #38

Making train_params and model_params functional to add documentation and avoid duplication #38

gokceneraslan commented Aug 5, 2024 •

edited

Loading

gokceneraslan commented Aug 5, 2024 •

edited

Loading

ekageyama commented Aug 5, 2024

gokceneraslan commented Aug 5, 2024 •

edited

Loading

ekageyama commented Aug 5, 2024

dagarfield commented Aug 5, 2024 via email

ekageyama commented Aug 5, 2024

Making train_params and model_params functional to add documentation and avoid duplication #38

Making train_params and model_params functional to add documentation and avoid duplication #38

Comments

gokceneraslan commented Aug 5, 2024 • edited Loading

gokceneraslan commented Aug 5, 2024 • edited Loading

ekageyama commented Aug 5, 2024

gokceneraslan commented Aug 5, 2024 • edited Loading

ekageyama commented Aug 5, 2024

dagarfield commented Aug 5, 2024 via email

ekageyama commented Aug 5, 2024

gokceneraslan commented Aug 5, 2024 •

edited

Loading

gokceneraslan commented Aug 5, 2024 •

edited

Loading

gokceneraslan commented Aug 5, 2024 •

edited

Loading