Skip to content

[PyTorch] Should I use DistributedDataParallel ? #2862

Answered by mchoi8739
RobinFrcd asked this question in Q&A
Discussion options

You must be logged in to vote

Hi,

The example you are looking at is using the native PyTorchs DistributedDataParallel modules.

The SageMaker doc you looked at is about the SageMaker's model parallel library. The doc is saying that SageMaker's model parallel library does not support the native PyTorch's DistributedDataParallel modules. Instead, you can use SageMaker's data parallel library that's compatible with the model parallel library.

The SageMaker data parallel library provides DDP modules equivalent to torch.nn.parallel.DistributedDataParallel. The SageMaker's distributed data parallelism modules and how to adapt your PyTorch training scripts are documented in https://docs.aws.amazon.com/sagemaker/latest/dg/data…

Replies: 1 comment 4 replies

Comment options

You must be logged in to vote
4 replies
@RobinFrcd
Comment options

@mchoi8739
Comment options

@mchoi8739
Comment options

@RobinFrcd
Comment options

Answer selected by RobinFrcd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants