Add customized static cache implementation #4385

helunwencser · 2024-07-24T00:23:45Z

This PR is based on the prototype done in huggingface/transformers#31706. We want to export HF models to ExecuTorch. However, we cannot do that right now due to its Cache is not a torch.nn.Module.

This PR tries to solve this problem by implementing a customized StaticCache which is not only a Cache but also a torch.nn.Module. Most of its implementation is copied from transformers with the following modification:

a subclass of torch.nn.Module
add register_buffer call, copied from [Demo][ExecuTorch] Lower and run native Gemma e2e in ExecuTorch huggingface/transformers#31706
fix two bugs in the StaticCache implementation: 1. get_seq_length should return a number instead of tensor, 2. update should only returned filled cache slots instead of the whole static cache.

Test Plan:
Make sure the following commands generate the exact same output but the later one, with kv cache enabled, is faster:

python3 -m examples.models.phi-3-mini.eager
python3 -m examples.models.phi-3-mini.eager -kv

pytorch-bot · 2024-07-24T00:23:47Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4385

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e31ff36 with merge base 28cfabb ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 24, 2024

helunwencser changed the base branch from main to use_kv_cache July 24, 2024 00:24

helunwencser changed the base branch from use_kv_cache to main July 24, 2024 00:25

helunwencser force-pushed the static_cache branch from fc0223d to c29f929 Compare July 24, 2024 00:30

helunwencser changed the base branch from main to use_kv_cache July 24, 2024 00:30

helunwencser requested review from larryliu0820 and guangy10 July 24, 2024 00:55

helunwencser force-pushed the static_cache branch from a2af0f5 to 8da2fab Compare July 24, 2024 22:45

helunwencser force-pushed the use_kv_cache branch from 9e9adec to 1bfb020 Compare July 24, 2024 22:46

helunwencser force-pushed the static_cache branch from 8da2fab to 7c52b85 Compare July 25, 2024 18:28

helunwencser changed the base branch from use_kv_cache to main July 25, 2024 18:31

helunwencser force-pushed the static_cache branch 3 times, most recently from da1e974 to b13f8aa Compare July 25, 2024 19:40

Add customized static cache implementation

e31ff36

helunwencser force-pushed the static_cache branch from b13f8aa to e31ff36 Compare July 30, 2024 17:33

helunwencser closed this Jul 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add customized static cache implementation #4385

Add customized static cache implementation #4385

helunwencser commented Jul 24, 2024 •

edited

Loading

pytorch-bot bot commented Jul 24, 2024 •

edited

Loading

Add customized static cache implementation #4385

Add customized static cache implementation #4385

Conversation

helunwencser commented Jul 24, 2024 • edited Loading

pytorch-bot bot commented Jul 24, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4385

✅ No Failures

helunwencser commented Jul 24, 2024 •

edited

Loading

pytorch-bot bot commented Jul 24, 2024 •

edited

Loading