-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revamp TorchRec intro tutorial #3064
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3064
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit f799561 with merge base 97b20b3 (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
kernels and GPU enabled operations to run | ||
|
||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
############################################### |
|
||
""" | ||
|
||
# Install stable versions for best reliability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Install stable versions for best reliability | |
# If you do not have the following components in your environment, please install them by running: | |
# | |
# .. code-block:: sh |
|
||
!pip3 install --pre torch --index-url https://download.pytorch.org/whl/cu121 -U | ||
!pip3 install fbgemm_gpu --index-url https://download.pytorch.org/whl/cu121 | ||
!pip3 install torchmetrics==1.0.3 | ||
!pip3 install torchrec --index-url https://download.pytorch.org/whl/cu121 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!pip3 install --pre torch --index-url https://download.pytorch.org/whl/cu121 -U | |
!pip3 install fbgemm_gpu --index-url https://download.pytorch.org/whl/cu121 | |
!pip3 install torchmetrics==1.0.3 | |
!pip3 install torchrec --index-url https://download.pytorch.org/whl/cu121 | |
# | |
# !pip3 install --pre torch --index-url https://download.pytorch.org/whl/cu121 -U | |
# !pip3 install fbgemm_gpu --index-url https://download.pytorch.org/whl/cu121 | |
# !pip3 install torchmetrics==1.0.3 | |
# !pip3 install torchrec --index-url https://download.pytorch.org/whl/cu121 |
@@ -0,0 +1,1081 @@ | |||
""" | |||
**Open Source Installation** (For Reference) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**Open Source Installation** (For Reference) | |
Introduction to TorchRec |
@@ -0,0 +1,1081 @@ | |||
""" | |||
**Open Source Installation** (For Reference) | |||
-------------------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-------------------------------------------- | |
================================== |
|
||
# Install stable versions for best reliability | ||
|
||
!pip3 install --pre torch --index-url https://download.pytorch.org/whl/cu121 -U |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!pip3 install --pre torch --index-url https://download.pytorch.org/whl/cu121 -U | |
# !pip3 install --pre torch --index-url https://download.pytorch.org/whl/cu121 -U |
# Install stable versions for best reliability | ||
|
||
!pip3 install --pre torch --index-url https://download.pytorch.org/whl/cu121 -U | ||
!pip3 install fbgemm_gpu --index-url https://download.pytorch.org/whl/cu121 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!pip3 install fbgemm_gpu --index-url https://download.pytorch.org/whl/cu121 | |
# !pip3 install fbgemm_gpu --index-url https://download.pytorch.org/whl/cu121 |
|
||
!pip3 install --pre torch --index-url https://download.pytorch.org/whl/cu121 -U | ||
!pip3 install fbgemm_gpu --index-url https://download.pytorch.org/whl/cu121 | ||
!pip3 install torchmetrics==1.0.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!pip3 install torchmetrics==1.0.3 | |
# !pip3 install torchmetrics==1.0.3 |
!pip3 install --pre torch --index-url https://download.pytorch.org/whl/cu121 -U | ||
!pip3 install fbgemm_gpu --index-url https://download.pytorch.org/whl/cu121 | ||
!pip3 install torchmetrics==1.0.3 | ||
!pip3 install torchrec --index-url https://download.pytorch.org/whl/cu121 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!pip3 install torchrec --index-url https://download.pytorch.org/whl/cu121 | |
# !pip3 install torchrec --index-url https://download.pytorch.org/whl/cu121 |
# Intro to TorchRec | ||
# ================= | ||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Intro to TorchRec | |
# ================= | |
# |
.ci/docker/requirements.txt
Outdated
@@ -70,3 +70,5 @@ pycocotools | |||
semilearn==0.3.2 | |||
torchao==0.0.3 | |||
segment_anything==1.0 | |||
torchrec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we pin dependencies to specific versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do torchrec==0.8 for now, and change to 1.0 once it's out?
# Set up environment variables for distributed training | ||
# RANK is which GPU we are on, default 0 | ||
os.environ["RANK"] = "0" | ||
# How many devices in our "world", since Bento can only handle 1 process, 1 GPU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should remove Bento, you can say Colab.
* Explore advanced techniques for distributing large embedding tables across multiple GPUs | ||
.. grid-item-card:: :octicon:`list-unordered;1em;` Prerequisites | ||
:class-card: card-prerequisites |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Explore advanced techniques for distributing large embedding tables across multiple GPUs | |
.. grid-item-card:: :octicon:`list-unordered;1em;` Prerequisites | |
:class-card: card-prerequisites | |
* Explore advanced techniques for distributing large embedding tables across multiple GPUs | |
.. grid-item-card:: :octicon:`list-unordered;1em;` Prerequisites | |
:class-card: card-prerequisites | |
# following dependencies: | ||
# | ||
# .. code-block:: sh | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
|
||
###################################################################### | ||
# Congrats! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Congrats! | |
# Conclusion |
# Congrats! | ||
# --------- | ||
# | ||
# You have now gone from training a distributed RecSys model all the way |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# You have now gone from training a distributed RecSys model all the way | |
# In this tutorial, you have gone from training a distributed RecSys model all the way |
# | ||
# You have now gone from training a distributed RecSys model all the way | ||
# to making it inference ready. | ||
# https://github.com/pytorch/torchrec/tree/main/torchrec/inference has a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# https://github.com/pytorch/torchrec/tree/main/torchrec/inference has a | |
# The `TorchRec repo <https://github.com/pytorch/torchrec/tree/main/torchrec/inference>`__ has a |
# | ||
# For more information, please see our | ||
# `dlrm <https://github.com/facebookresearch/dlrm/tree/main/torchrec_dlrm/>`__ | ||
# example, which includes multinode training on the criteo terabyte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# example, which includes multinode training on the criteo terabyte | |
# example, which includes multinode training on the Criteo 1TB |
# For more information, please see our | ||
# `dlrm <https://github.com/facebookresearch/dlrm/tree/main/torchrec_dlrm/>`__ | ||
# example, which includes multinode training on the criteo terabyte | ||
# dataset, using Meta’s `DLRM <https://arxiv.org/abs/1906.00091>`__. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# dataset, using Meta’s `DLRM <https://arxiv.org/abs/1906.00091>`__. | |
# dataset using the methods described in `Deep Learning Recommendation Model for Personalization and Recommendation Systems (DLRM) <https://arxiv.org/abs/1906.00091>`__. |
@@ -619,3 +619,32 @@ warmup | |||
webp | |||
wsi | |||
wsis | |||
Meta's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you run the sort command in vim (:sort
), so it's sorted alphabetically.
# Embeddings in PyTorch | ||
# --------------------- | ||
# | ||
# `torch.nn.Embedding <https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html>`__: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# `torch.nn.Embedding <https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html>`__: | |
# :class:`torch.nn.Embedding`: |
# Embedding table where forward pass returns the embeddings themselves as | ||
# is. | ||
# | ||
# `torch.nn.EmbeddingBag <https://pytorch.org/docs/stable/generated/torch.nn.EmbeddingBag.html>`__: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# `torch.nn.EmbeddingBag <https://pytorch.org/docs/stable/generated/torch.nn.EmbeddingBag.html>`__: | |
# :class:`torch.nn.EmbeddingBag`: |
# predictions and reduced model size. For example FP32 (4 bytes) in | ||
# trained model to INT8 (1 byte) for each embedding weight. This is also | ||
# necessary given the vast scale of embedding tables, as we want to use as | ||
# few devices as possible for inference to minimize latency. \* **C++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's unfortunately the bug of the theme that we can't currently address. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Description
Revamps TorchRec tutorial for stable release
cc @iamzainhuda