Release v0.3.1: Patch Release · linkedin/Liger-Kernel

Summary

This patch release brings important updates and fixes to Liger-Kernel. Notable changes include:

KLDiv calculation fix: KLDiv now functions correctly with larger vocab sizes
SwiGLU/GeGLU casting fix: Program IDs are now cast to int64 in SwiGLU/GeGLU kernels to prevent memory errors with larger dimensions.
AutoLigerKernelForCausalLM fix: The model now properly passes through all original keyword arguments
Post-init model patching fix: Fix to post-init model patching to ensure HF Trainer integration works correctly
Relaxed transformers dependency: Improve compatibility with a broader range of versions.

Remove debug print statement by @EdoardoLuciani in #247
[Easy] Cast program_id to int64 in SwiGLU/GeGLU kernels by @hansonw in #251
Fix a comment typo in flce by @Tcc0403 in #256
Fix AutoLigerKernelForCausalLM to pass through original kwargs by @shimizust in #263
Update contributing guide for adding a new model by @shivam15s in #260
chore: Add Qwen2.5 and Phi3.5 to Readme by @tyler-romero in #265
rename cuda mode to gpu mode by @msaroufim in #267
Fix sharing a ResBlock layer for each head in Medusa example by @chiwanpark in #269
Fix/kldiv by @S1ro1 in #262
Post-init model patching fix by @shimizust in #280
Relaxed transformers dependency by @shimizust in #270
Disable gemma2 and qwen2_vl tests by @shimizust in #288
Release version 0.3.1 by @shimizust in #286

Full Changelog: v0.3.0...v0.3.1