8-bit allgather support #722

yaroslavvb · 2024-09-19T18:37:17Z

❓ The question

Is there plan or any partial work done towards supporting 8-bit AllGather in Olmo?
https://dev-discuss.pytorch.org/t/enabling-float8-all-gather-in-fsdp2/2359

Authors observe 50% improvement in throughput for training Llama 70B with on-par numerics, which seems significant (depending on what "on par numerics" means)

dirkgr · 2024-10-19T00:41:20Z

My understanding is that 8 bit all_gather is something you would only do if you're already doing compute in 8 bit. We have an experimental branch for 8 bit compute here: https://github.com/allenai/OLMo-core/tree/epwalsh/float8-investigation. This is using the new, faster trainer. So far it is faster by an impressive margin (not 50% though), but we have not vetted it at larger scales.

8 bit all_gather would be another step after that.

yaroslavvb added the type/question An issue that's a question label Sep 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8-bit allgather support #722

8-bit allgather support #722

yaroslavvb commented Sep 19, 2024

dirkgr commented Oct 19, 2024

8-bit allgather support #722

8-bit allgather support #722

Comments

yaroslavvb commented Sep 19, 2024

❓ The question

dirkgr commented Oct 19, 2024