bigscience-workshop / Megatron-DeepSpeed Public

Notifications You must be signed in to change notification settings
Fork 215
Star 1.3k

Code
Issues 74
Pull requests 45
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: bigscience-workshop/Megatron-DeepSpeed

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

74 Open 70 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

How can I set recomputation-granularity,like selective or full?

#403 opened Apr 30, 2024 by LordEdison

Hello, what version of the megatron-lm library is your code modified?

#401 opened Feb 26, 2024 by 4thGardenOfQMH

Is this assertion for mask wrong?

#400 opened Feb 15, 2024 by yinfangchen

Hello, can Megatron-DeepSpeed pre-train llama2?

#398 opened Oct 12, 2023 by 13416157913

the traing log like this is Normal？ I do not find loss in the logs, and what does the "grad norm: nan" mean?

#396 opened Aug 27, 2023 by alphanlp

The difference between zero-3 and megatron with zero-2

#395 opened Aug 25, 2023 by nicosouth

Question about the implementation of mpu.cross_entropy when using tensor parallel

#394 opened Aug 3, 2023 by robin087

questions about inconsistent evaluation result

#392 opened Jul 24, 2023 by coorful

Question about ds to universal

#388 opened May 31, 2023 by saxh

RuntimeError: Error building extension 'scaled_upper_triang_masked_softmax_cuda'

#387 opened May 26, 2023 by zll0000

hello， I meet a problem

#386 opened May 22, 2023 by etoilestar

How to properly use Flops Profiler with pipelined parallelism?

#385 opened May 9, 2023 by flyingdown

pip install -e . failed with ModuleNotFoundError: No module named 'torch'

#383 opened May 6, 2023 by SeekPoint

Help me, I'm dying soon，error: command '/opt/rh/devtoolset-7/root/usr/bin/gcc' failed with exit code 1 error: subprocess-exited-with-error

#382 opened May 5, 2023 by listwebit

Megatron-DeepSpeed only applies to specific models?

#381 opened May 4, 2023 by Bob-cby

The given group does not exist pytorch

#379 opened Apr 25, 2023 by germanjke

upgrade megatron-lm

#378 opened Apr 21, 2023 by dz1iang

How can we access to the gradients while the model is training?

#377 opened Apr 19, 2023 by BilgehanSel

how to do prompt learning with bloom?

#376 opened Apr 10, 2023 by moseshu

how to frozen some layers of GPT, only fintune last k layers?

#375 opened Apr 4, 2023 by joan126

Can I use python only apex for gpt_pretrain?

#373 opened Mar 21, 2023 by Luoyang144

how to pretrain t5-lm adapted?

#372 opened Mar 18, 2023 by nanyyyyyy

How to preprocess data for t5 model?

#371 opened Mar 14, 2023 by xiu-ze

Exception: cuda rng state model-parallel-rng is not added

#369 opened Mar 6, 2023 by 520jefferson

How to continue pre-training Bloom?

#366 opened Feb 26, 2023 by ShinoharaHare

Previous 1 2 3 Next

Previous Next

ProTip! Follow long discussions with comments:>50.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly