OpenVINO™ Model Server 2024.3
michalkulakowski
released this
31 Jul 14:15
·
5 commits
to releases/2024/3
since this release
The 2024.3 release focus mostly on improvements in OpenAI API text generation implementation.
Changes and improvements
A set of improvements in OpenAI API text generation:
- Significantly better performance thanks to numerous improvements in OpenVINO Runtime and sampling algorithms
- Added config parameters
best_of_limit
andmax_tokens_limit
to avoid memory overconsumption impact from invalid requests Read more - Added reporting LLM metrics in the server logs Read more
- Added extra sampling parameters
diversity_penalty
,length_penalty
,repetition_penalty
. Read more
Improvements in documentation and demos:
- Added RAG demo with OpenAI API
- Added K8S deployment demo for text generation scenarios
- Simplified models initialization for a set of demos with mediapipe graphs using pose_detection model. TFLite models don't required any conversions Check demo
Breaking changes
No breaking changes.
Bug fixes
- Resolved issue with sporadic text generation hang via OpenAI API endpoints
- Fixed issue with chat streamer impacting incomplete utf-8 sequences
- Corrected format of the last streaming event in
completions
endpoint - Fixed issue with request hanging when running out of available cache
You can use an OpenVINO Model Server public Docker images based on Ubuntu via the following command:
docker pull openvino/model_server:2024.3
- CPU device support with the image based on Ubuntu22.04
docker pull openvino/model_server:2024.3-gpu
- GPU and CPU device support with the image based on Ubuntu22.04
or use provided binary packages.
The prebuilt image is available also on RedHat Ecosystem Catalog