Release OpenVINO™ Model Server 2024.3 · openvinotoolkit/model_server

The 2024.3 release focus mostly on improvements in OpenAI API text generation implementation.

Changes and improvements

A set of improvements in OpenAI API text generation:

Significantly better performance thanks to numerous improvements in OpenVINO Runtime and sampling algorithms
Added config parameters best_of_limit and max_tokens_limit to avoid memory overconsumption impact from invalid requests Read more
Added reporting LLM metrics in the server logs Read more
Added extra sampling parameters diversity_penalty, length_penalty, repetition_penalty. Read more

Improvements in documentation and demos:

Added RAG demo with OpenAI API
Added K8S deployment demo for text generation scenarios
Simplified models initialization for a set of demos with mediapipe graphs using pose_detection model. TFLite models don't required any conversions Check demo

Breaking changes

No breaking changes.

Bug fixes

Resolved issue with sporadic text generation hang via OpenAI API endpoints
Fixed issue with chat streamer impacting incomplete utf-8 sequences
Corrected format of the last streaming event in completions endpoint
Fixed issue with request hanging when running out of available cache

You can use an OpenVINO Model Server public Docker images based on Ubuntu via the following command:
docker pull openvino/model_server:2024.3 - CPU device support with the image based on Ubuntu22.04
docker pull openvino/model_server:2024.3-gpu - GPU and CPU device support with the image based on Ubuntu22.04
or use provided binary packages.
The prebuilt image is available also on RedHat Ecosystem Catalog

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenVINO™ Model Server 2024.3

Changes and improvements

Breaking changes

Bug fixes