Pinned Loading
-
huggingface/nanotron
huggingface/nanotron PublicMinimalistic large language model 3D-parallelism training
-
huggingface/transformers
huggingface/transformers Public🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
-
NVIDIA/Megatron-LM
NVIDIA/Megatron-LM PublicOngoing research training transformer models at scale
-
linkedin/Liger-Kernel
linkedin/Liger-Kernel PublicEfficient Triton Kernels for LLM Training
-
tomaarsen/attention_sinks
tomaarsen/attention_sinks PublicExtend existing LLMs way beyond the original training length with constant memory usage, without retraining
-
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.