Skip to content

CTranslate2 3.12.0

Compare
Choose a tag to compare
@guillaumekln guillaumekln released this 17 Apr 18:22
· 237 commits to master since this release

New features

  • Add methods Generator.generate_tokens and Translator.generate_tokens returning a generator that yields tokens as soon as they are generated by the model (not compatible with beam search)
  • Improve performance of rotary embeddings on CPU with an alternative implementation that is enabled when setting rotary_interleave=False in the model specification (may require to permute QK weights)
  • Support a variable number of input frames in method Whisper.align to improve batch support
  • Expose flag low_cpu_mem_usage in the Transformers converter to reduce the memory usage when loading large models (requires the package accelerate)

Fixes and improvements

  • Fix crash in Whisper.align when num_frames // 2 <= median_filter_width
  • Raise an error if arguments end_token or suppress_sequences contain tokens that are not in the vocabulary
  • Optimize the quantization of FP16 weights during the model conversion
  • In the Transformers converter, also load the model weights in FP16 when the selected quantization is int8_float16
  • Update the Whisper timestamp decoding rules to prevent the generation of segments with zero duration