-
Notifications
You must be signed in to change notification settings - Fork 156
Issues: pytorch/torchtitan
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Process got stuck when trying to optimize different groups of parameters using different types of data
#584
opened Sep 18, 2024 by
Yangyi-Chen
DDP (replicate) + TP?
question
Further information is requested
#577
opened Sep 13, 2024 by
yzs981130
Wrong train_state.step when resuming from checkpoint for the second time
bug
Something isn't working
#571
opened Sep 8, 2024 by
LeoXinhaoLee
Pipeline Parallelism + FSDP
question
Further information is requested
#562
opened Aug 29, 2024 by
jeromeku
Fail-safe and partial redundancy for HSDP on unreliable compute
enhancement
New feature or request
#561
opened Aug 27, 2024 by
evkogs
PP UX/training confusion re: loss = -1. (need to better document or add auto logging of last rank loss?)
#550
opened Aug 21, 2024 by
lessw2020
2D whole model compile fails at embedding layer
bug
Something isn't working
#534
opened Aug 20, 2024 by
tianyu-l
[rfc] getting rid of seed-checkpoint for Pipeline Parallelism
enhancement
New feature or request
#514
opened Aug 10, 2024 by
wconstab
[Request] Decouple profiler New feature or request
profile_freq
from memory snapshot frequency
enhancement
#475
opened Jul 23, 2024 by
awgu
Only half of parameters are saved when applied PP
bug
Something isn't working
#474
opened Jul 22, 2024 by
dmammfl
[FP8 options] Float8Linear vs TransformerEngine
question
Further information is requested
#462
opened Jul 16, 2024 by
yundai424
Question about custom cuda operators for tensor parallelism
question
Further information is requested
#434
opened Jun 28, 2024 by
vermouth1992
Question about Pipeline parallelism
question
Further information is requested
#431
opened Jun 27, 2024 by
vermouth1992
Llama models with custom configurations and uploading to Hugging Face
enhancement
New feature or request
#420
opened Jun 24, 2024 by
bkchang
DataLoader state is empty for different ranks ?
question
Further information is requested
#409
opened Jun 17, 2024 by
ahatamiz
benchmark perf numbers on H100 GPUs and update performance.md
documentation
Improvements or additions to documentation
Add torchdata to requirements after release
better_engineering
Repo code quality improvements
#351
opened May 21, 2024 by
gokulavasan
numerical difference for SDPA between non-dtensor vs dtensor, when math attention and fp16 are used
bug
Something isn't working
#317
opened May 8, 2024 by
tianyu-l
freqs_cis
in llama model should be a non-persistent buffer
bug
#316
opened May 8, 2024 by
tianyu-l
Previous Next
ProTip!
no:milestone will show everything without a milestone.