Skip to content

OpenVINO™ Model Server 2024.3

Compare
Choose a tag to compare
@michalkulakowski michalkulakowski released this 31 Jul 14:15
· 5 commits to releases/2024/3 since this release
a6ddd3f

The 2024.3 release focus mostly on improvements in OpenAI API text generation implementation.

Changes and improvements

A set of improvements in OpenAI API text generation:

  • Significantly better performance thanks to numerous improvements in OpenVINO Runtime and sampling algorithms
  • Added config parameters best_of_limit and max_tokens_limit to avoid memory overconsumption impact from invalid requests Read more
  • Added reporting LLM metrics in the server logs Read more
  • Added extra sampling parameters diversity_penalty, length_penalty, repetition_penalty. Read more

Improvements in documentation and demos:

  • Added RAG demo with OpenAI API
  • Added K8S deployment demo for text generation scenarios
  • Simplified models initialization for a set of demos with mediapipe graphs using pose_detection model. TFLite models don't required any conversions Check demo

Breaking changes

No breaking changes.

Bug fixes

  • Resolved issue with sporadic text generation hang via OpenAI API endpoints
  • Fixed issue with chat streamer impacting incomplete utf-8 sequences
  • Corrected format of the last streaming event in completions endpoint
  • Fixed issue with request hanging when running out of available cache

You can use an OpenVINO Model Server public Docker images based on Ubuntu via the following command:
docker pull openvino/model_server:2024.3 - CPU device support with the image based on Ubuntu22.04
docker pull openvino/model_server:2024.3-gpu - GPU and CPU device support with the image based on Ubuntu22.04
or use provided binary packages.
The prebuilt image is available also on RedHat Ecosystem Catalog