Chapter on writing your own training loop instead of using allennlp commands #118

matt-gardner · 2020-06-04T00:38:34Z

This is just a placeholder: https://guide.allennlp.org/writing-your-own-script

NicolasAG · 2020-07-24T18:33:29Z

In this chapter, it would be nice to also see how to use more than 1 GPU.
in my custom script I have a build_trainer() method that returns an instance of a GradientDescentTrainer that I initialize like this:

def build_trainer(...):
    ...
    if torch.cuda.is_available():
        cuda_devices = torch.cuda.device_count()
        model = model.cuda(list(range(cuda_devices))[0])
    else:
        cuda_devices = -1

    return GradientDescentTrainer(
        ...
        cuda_device=list(range(cuda_devices))[0] if cuda_devices > 0 else None,
        distributed=(cuda_devices > 1),
        local_rank=0,
        world_size=abs(cuda_devices),
        ...
    )

it works well for 0 or 1 GPU but fails miserably when I use >1 GPU.
I'm looking at how the allennlp train command does it (https://github.com/allenai/allennlp/blob/master/allennlp/commands/train.py#L221) and it seems a bit complicated :/
Is there a clean way to have a custom script using multi-GPU?

matt-gardner · 2020-07-27T16:05:47Z

Yes, this is something that currently is not very easy in your own script, unfortunately. You'd want to pattern your code after what we do in the file you referenced. There's probably some refactoring we could do to make this easier, but it's not high on our priority list right now.

LKChenLK · 2022-11-09T09:37:51Z

Hello, I'd like to know if this chapter will ever be written, now that AllenNLP goes into maintenance mode. If not, is there any good example scripts that I can look at or modify to write my own training loop?
Thanks.

matt-gardner added the Chapter Request label Jun 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chapter on writing your own training loop instead of using allennlp commands #118

Chapter on writing your own training loop instead of using allennlp commands #118

matt-gardner commented Jun 4, 2020

NicolasAG commented Jul 24, 2020

matt-gardner commented Jul 27, 2020

LKChenLK commented Nov 9, 2022

Chapter on writing your own training loop instead of using allennlp commands #118

Chapter on writing your own training loop instead of using allennlp commands #118

Comments

matt-gardner commented Jun 4, 2020

NicolasAG commented Jul 24, 2020

matt-gardner commented Jul 27, 2020

LKChenLK commented Nov 9, 2022