Skip to content
This repository has been archived by the owner on Feb 6, 2024. It is now read-only.

Chapter on writing your own training loop instead of using allennlp commands #118

Open
matt-gardner opened this issue Jun 4, 2020 · 3 comments

Comments

@matt-gardner
Copy link
Contributor

This is just a placeholder: https://guide.allennlp.org/writing-your-own-script

@NicolasAG
Copy link

In this chapter, it would be nice to also see how to use more than 1 GPU.
in my custom script I have a build_trainer() method that returns an instance of a GradientDescentTrainer that I initialize like this:

def build_trainer(...):
    ...
    if torch.cuda.is_available():
        cuda_devices = torch.cuda.device_count()
        model = model.cuda(list(range(cuda_devices))[0])
    else:
        cuda_devices = -1

    return GradientDescentTrainer(
        ...
        cuda_device=list(range(cuda_devices))[0] if cuda_devices > 0 else None,
        distributed=(cuda_devices > 1),
        local_rank=0,
        world_size=abs(cuda_devices),
        ...
    )

it works well for 0 or 1 GPU but fails miserably when I use >1 GPU.
I'm looking at how the allennlp train command does it (https://github.com/allenai/allennlp/blob/master/allennlp/commands/train.py#L221) and it seems a bit complicated :/
Is there a clean way to have a custom script using multi-GPU?

@matt-gardner
Copy link
Contributor Author

Yes, this is something that currently is not very easy in your own script, unfortunately. You'd want to pattern your code after what we do in the file you referenced. There's probably some refactoring we could do to make this easier, but it's not high on our priority list right now.

@LKChenLK
Copy link

LKChenLK commented Nov 9, 2022

Hello, I'd like to know if this chapter will ever be written, now that AllenNLP goes into maintenance mode. If not, is there any good example scripts that I can look at or modify to write my own training loop?
Thanks.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants