v0.3.1: Patch Release
Summary
This patch release brings important updates and fixes to Liger-Kernel. Notable changes include:
- KLDiv calculation fix: KLDiv now functions correctly with larger vocab sizes
- SwiGLU/GeGLU casting fix: Program IDs are now cast to int64 in SwiGLU/GeGLU kernels to prevent memory errors with larger dimensions.
- AutoLigerKernelForCausalLM fix: The model now properly passes through all original keyword arguments
- Post-init model patching fix: Fix to post-init model patching to ensure HF Trainer integration works correctly
- Relaxed transformers dependency: Improve compatibility with a broader range of versions.
What's Changed
- Remove debug print statement by @EdoardoLuciani in #247
- [Easy] Cast program_id to int64 in SwiGLU/GeGLU kernels by @hansonw in #251
- Fix a comment typo in flce by @Tcc0403 in #256
- Fix AutoLigerKernelForCausalLM to pass through original kwargs by @shimizust in #263
- Update contributing guide for adding a new model by @shivam15s in #260
- chore: Add Qwen2.5 and Phi3.5 to Readme by @tyler-romero in #265
- rename cuda mode to gpu mode by @msaroufim in #267
- Fix sharing a ResBlock layer for each head in Medusa example by @chiwanpark in #269
- Fix/kldiv by @S1ro1 in #262
- Post-init model patching fix by @shimizust in #280
- Relaxed transformers dependency by @shimizust in #270
- Disable gemma2 and qwen2_vl tests by @shimizust in #288
- Release version 0.3.1 by @shimizust in #286
New Contributors
- @EdoardoLuciani made their first contribution in #247
- @msaroufim made their first contribution in #267
Full Changelog: v0.3.0...v0.3.1