Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to Trainer #32860

JasonZhu1313 · 2024-08-17T00:08:35Z

What does this PR do?

Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to HF Trainer with optional flag

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Tests:

pytest tests/trainer/test_trainer.py::TrainerIntegrationTest::test_apply_liger_kernel

pytest tests/trainer/test_trainer.py::TrainerIntegrationTest::test_use_liger_kernel_patching tests/trainer/test_trainer.py::TrainerIntegrationTest::test_use_liger_kernel_trainer

======================================= test session starts ========================================
platform linux -- Python 3.10.12, pytest-7.4.4, pluggy-1.5.0
rootdir: /content/transformers-jaszhu
configfile: pyproject.toml
plugins: rich-0.1.1, timeout-2.3.1, xdist-3.6.1
collected 2 items

tests/trainer/test_trainer.py ..                                                             [100%]

======================================== 2 passed in 9.47s =========================================

E2E test

{'loss': 1.6157, 'grad_norm': 32.0, 'learning_rate': 2.4324324324324326e-07, 'epoch': 0.0, 'num_input_tokens_seen': 60416, 'step': 3, 'step_time_sec': 4.87, 'avg_step_time_sec': 6.82, 'time_to_completion_sec': 4970.4, 'estimated_total_time_sec': 4990.85, 'step_peak_memory_allocated_MB': 76728.45, 'total_peak_memory_allocated_MB': 76728.74, 'step_peak_memory_reserved_MB': 79692.0, 'total_peak_memory_reserved_MB': 80364.0, 'step_tokens_per_second': 3138.55, 'avg_tokens_per_second': 3158.65}
{'loss': 1.5678, 'grad_norm': 26.875, 'learning_rate': 3.2432432432432436e-07, 'epoch': 0.01, 'num_input_tokens_seen': 84992, 'step': 4, 'step_time_sec': 7.82, 'avg_step_time_sec': 7.15, 'time_to_completion_sec': 5206.53, 'estimated_total_time_sec': 5235.14, 'step_peak_memory_allocated_MB': 76728.67, 'total_peak_memory_allocated_MB': 76728.74, 'step_peak_memory_reserved_MB': 80194.0, 'total_peak_memory_reserved_MB': 80364.0, 'step_tokens_per_second': 3142.99, 'avg_tokens_per_second': 3152.94}
{'loss': 1.74, 'grad_norm': 28.875, 'learning_rate': 4.0540540540540546e-07, 'epoch': 0.01, 'num_input_tokens_seen': 103936, 'step': 5, 'step_time_sec': 5.75, 'avg_step_time_sec': 6.8, 'time_to_completion_sec': 4945.07, 'estimated_total_time_sec': 4979.08, 'step_peak_memory_allocated_MB': 76728.54, 'total_peak_memory_allocated_MB': 76728.74, 'step_peak_memory_reserved_MB': 80324.0, 'total_peak_memory_reserved_MB': 80364.0, 'step_tokens_per_second': 3293.14, 'avg_tokens_per_second': 3182.59}
{'loss': 1.7297, 'grad_norm': 29.25, 'learning_rate': 4.864864864864865e-07, 'epoch': 0.01, 'num_input_tokens_seen': 124416, 'step': 6, 'step_time_sec': 6.23, 'avg_step_time_sec': 6.69, 'time_to_completion_sec': 4855.78, 'estimated_total_time_sec': 4895.91, 'step_peak_memory_allocated_MB': 76728.57, 'total_peak_memory_allocated_MB': 76728.74, 'step_peak_memory_reserved_MB': 80288.0, 'total_peak_memory_reserved_MB': 80364.0, 'step_tokens_per_second': 3285.22, 'avg_tokens_per_second': 3201.72}
{'loss': 1.6393, 'grad_norm': 27.75, 'learning_rate': 5.675675675675676e-07, 'epoch': 0.01, 'num_input_tokens_seen': 153920, 'step': 7, 'step_time_sec': 9.22, 'avg_step_time_sec': 7.11, 'time_to_completion_sec': 5154.73, 'estimated_total_time_sec': 5204.5, 'step_peak_memory_allocated_MB': 76728.78, 'total_peak_memory_allocated_MB': 76728.78, 'step_peak_memory_reserved_MB': 79652.0, 'total_peak_memory_reserved_MB': 80364.0, 'step_tokens_per_second': 3200.77, 'avg_tokens_per_second': 3201.51}
{'loss': 1.5642, 'grad_norm': 27.25, 'learning_rate': 6.486486486486487e-07, 'epoch': 0.01, 'num_input_tokens_seen': 170752, 'step': 8, 'step_time_sec': 5.49, 'avg_step_time_sec': 6.88, 'time_to_completion_sec': 4980.15, 'estimated_total_time_sec': 5035.18, 'step_peak_memory_allocated_MB': 76728.49, 'total_peak_memory_allocated_MB': 76728.78, 'step_peak_memory_reserved_MB': 79988.0, 'total_peak_memory_reserved_MB': 80364.0, 'step_tokens_per_second': 3065.48, 'avg_tokens_per_second': 3186.0}

When liger is lower version, the error is thrown ImportError: You have set use_ligertoTruebut liger-kernel >= 0.1.0 is not available. Please install it withpip install liger-kernel`
Model type is correct extracted as "llama"

Test conditions: LLaMA 3-8B, Batch Size = 64, Data Type = bf16, Optimizer = AdamW, Gradient Checkpointing = True, Distributed Strategy = FSDP1 on 4 A100s.

When use_liger=Ture, memory usage and throughput shows improvement compared to use_liger=False, default value

Note: for more detailed benchmark setup and more exciting efficiency for multi-head training (Medusa), please refer to original repo: https://github.com/linkedin/Liger-Kernel (repo will be public soon!!!)

amyeroberts · 2024-08-19T09:11:58Z

cc @ArthurZucker @muellerzr

ArthurZucker

Sounds great to me! Let's make sure we add a tad bit of doc about it! 🤗

muellerzr

Thanks! Can you rebase from main? (This should fix the CI I think)

Co-authored-by: Marc Sun <[email protected]>

Co-authored-by: Byron Hsu <[email protected]>

Co-authored-by: Marc Sun <[email protected]>

Co-authored-by: Byron Hsu <[email protected]>

HuggingFaceDocBuilderDev · 2024-08-20T18:32:00Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

docs/source/en/trainer.md

src/transformers/training_args.py

ByronHsu · 2024-08-21T04:21:19Z

lgtm!

SunMarc

Nice ! Just a nit ! Also, let us know when you want to merge this PR as the Liger repo is still not public.

tests/trainer/test_trainer.py

muellerzr · 2024-08-22T12:42:31Z

@JasonZhu1313 if you run make fixup it should fix the quality tests :) Otherwise as Marc said, let us know when we're okay to land this and we'll merge it immediately 🚀

JasonZhu1313 · 2024-08-22T15:59:50Z

@JasonZhu1313 if you run make fixup it should fix the quality tests :) Otherwise as Marc said, let us know when we're okay to land this and we'll merge it immediately 🚀

Thanks the repo will be open sourced on Friday

JasonZhu1313 · 2024-08-22T20:54:30Z

@JasonZhu1313 if you run make fixup it should fix the quality tests :) Otherwise as Marc said, let us know when we're okay to land this and we'll merge it immediately 🚀

Thanks the repo will be open sourced on Friday

The code is open to public, we are ready to merge the PR!

ByronHsu

Excited to collaborate with Hugging Face!!

SunMarc · 2024-08-23T11:20:45Z

Nice ! Merging !

…uggingface#32860) * add liger integration * fix syntax * fix import issue * add trainer.md * Use _apply_liger_kernel() * Fixed log message * Update docs/source/en/trainer.md Co-authored-by: Marc Sun <[email protected]> * Update docs/source/en/trainer.md Co-authored-by: Marc Sun <[email protected]> * Update src/transformers/training_args.py Co-authored-by: Byron Hsu <[email protected]> * Update src/transformers/trainer.py Co-authored-by: Marc Sun <[email protected]> * Update src/transformers/training_args.py Co-authored-by: Byron Hsu <[email protected]> * Update docs/source/en/trainer.md Co-authored-by: Byron Hsu <[email protected]> * Fixed checkstyle and updated readme * Added test * Fixed checkstyle * fix docstring * rename use_liger to use_liger_kernel * Trigger Build * Added test * add fix-copies * Fixed copy inconsistencies --------- Co-authored-by: shimizust <[email protected]> Co-authored-by: Steven Shimizu <[email protected]> Co-authored-by: Marc Sun <[email protected]> Co-authored-by: Byron Hsu <[email protected]>

JasonZhu1313 marked this pull request as draft August 17, 2024 00:18

JasonZhu1313 mentioned this pull request Aug 17, 2024

Add MODEL_TO_LIGER_KERNEL_PATCHING_FUNC to minimize dependencies from external code linkedin/Liger-Kernel#40

Closed

3 tasks

ArthurZucker approved these changes Aug 19, 2024

View reviewed changes

JasonZhu1313 changed the title ~~[WIP] Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to Trainer~~ Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to Trainer Aug 19, 2024

muellerzr approved these changes Aug 19, 2024

View reviewed changes

JasonZhu1313 and others added 6 commits August 19, 2024 22:11

add liger integration

9773382

fix syntax

e94e09a

fix import issue

20c78c8

add trainer.md

f44157f

Use _apply_liger_kernel()

f4e9747

Fixed log message

38e2acd

SunMarc mentioned this pull request Aug 20, 2024

Added use_liger flag to Trainer #32889

Closed

5 tasks

shimizust and others added 6 commits August 20, 2024 09:35

Update docs/source/en/trainer.md

f27fdce

Co-authored-by: Marc Sun <[email protected]>

Update docs/source/en/trainer.md

a74ca24

Co-authored-by: Marc Sun <[email protected]>

Update src/transformers/training_args.py

d3d29f4

Co-authored-by: Byron Hsu <[email protected]>

Update src/transformers/trainer.py

29b13a9

Co-authored-by: Marc Sun <[email protected]>

Update src/transformers/training_args.py

f0b2125

Co-authored-by: Byron Hsu <[email protected]>

Update docs/source/en/trainer.md

8639629

Co-authored-by: Byron Hsu <[email protected]>

shimizust force-pushed the jaszhu/liger-kernel branch from ade13f4 to 8639629 Compare August 20, 2024 16:51

Fixed checkstyle and updated readme

2d7c4ab

shimizust added 2 commits August 20, 2024 21:10

Added test

e51eb93

Fixed checkstyle

c286e16

JasonZhu1313 marked this pull request as ready for review August 20, 2024 21:52

helloworld1 reviewed Aug 20, 2024

View reviewed changes

docs/source/en/trainer.md Outdated Show resolved Hide resolved

src/transformers/training_args.py Outdated Show resolved Hide resolved

JasonZhu1313 added 2 commits August 20, 2024 15:15

fix docstring

fc05ba6

rename use_liger to use_liger_kernel

b2bae31

JasonZhu1313 force-pushed the jaszhu/liger-kernel branch from ef13f5a to b2bae31 Compare August 20, 2024 22:30

Trigger Build

d0b4be4

Merge branch 'huggingface:main' into jaszhu/liger-kernel

f2af439

helloworld1 approved these changes Aug 21, 2024

View reviewed changes

SunMarc approved these changes Aug 21, 2024

View reviewed changes

tests/trainer/test_trainer.py Outdated Show resolved Hide resolved

shimizust and others added 2 commits August 21, 2024 21:23

Added test

59a900b

Merge branch 'huggingface:main' into jaszhu/liger-kernel

7a88b06

Merge branch 'huggingface:main' into jaszhu/liger-kernel

c2756e9

JasonZhu1313 and others added 2 commits August 22, 2024 09:48

add fix-copies

62eff43

Merge branch 'huggingface:main' into jaszhu/liger-kernel

d3ae400

Fixed copy inconsistencies

eaf602b

ByronHsu approved these changes Aug 23, 2024

View reviewed changes

SunMarc merged commit adb9117 into huggingface:main Aug 23, 2024
24 checks passed

ryankert01 mentioned this pull request Sep 2, 2024

[Docs] Add Liger-Kernel usage to SFTTrainer page huggingface/trl#2007

Merged

5 tasks

shimizust mentioned this pull request Sep 16, 2024

Updated Trainer's liger-kernel integration to call correct patching API #33502

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to Trainer #32860

Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to Trainer #32860

JasonZhu1313 commented Aug 17, 2024 •

edited

Loading

amyeroberts commented Aug 19, 2024

ArthurZucker left a comment

muellerzr left a comment

HuggingFaceDocBuilderDev commented Aug 20, 2024

ByronHsu commented Aug 21, 2024

SunMarc left a comment

muellerzr commented Aug 22, 2024

JasonZhu1313 commented Aug 22, 2024

JasonZhu1313 commented Aug 22, 2024

ByronHsu left a comment

SunMarc commented Aug 23, 2024

Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to Trainer #32860

Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to Trainer #32860

Conversation

JasonZhu1313 commented Aug 17, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

Tests:

amyeroberts commented Aug 19, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

muellerzr left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Aug 20, 2024

ByronHsu commented Aug 21, 2024

SunMarc left a comment

Choose a reason for hiding this comment

muellerzr commented Aug 22, 2024

JasonZhu1313 commented Aug 22, 2024

JasonZhu1313 commented Aug 22, 2024

ByronHsu left a comment

Choose a reason for hiding this comment

SunMarc commented Aug 23, 2024

JasonZhu1313 commented Aug 17, 2024 •

edited

Loading