Releases: openvinotoolkit/model_server
OpenVINO™ Model Server 2022.3
The 2022.3 version is a major release. It includes several new features, enhancements and bug fixes.
New Features
Import TensorFlow Models – preview feature
OpenVINO Model Server can now load TensorFlow models directly from the model repository. Converting to OpenVINO Intermediate Representation (IR) format with model optimizer is not required. This is a preview feature with several limitations. The model must be in a frozen graph format with .pb extension. Loaded models take advantage of all OpenVINO optimizations. Learn more about it and check this demo.
C API interface to the model server internal functions – preview feature
It is now possible to leverage the model management functionality in OpenVINO Model Server for local inference execution within an application. Just dynamically link the OVMS shared library to take advantage of its new C API and use internal model server functions in C/C++ applications. To learn more see the documentation and check this demo.
Extended KServe gRPC API
The KServe gRPC API implemented in OpenVINO Model Server has been extended to support both input and output in format of Tensor data and raw data. Output format is consistent with the input format. This extension enables using Triton Client library with OpenVINO Model Server to send inference requests. The input data can be prepared as vectors or encoded as jpeg/png and sent as bytes. Learn more about the current API and check Python and C++ samples.
Extended KServe REST API
The KServe REST API now has additional functionality that improves compatibility with Triton Inference Server extension. It is now possible to send raw data in an HTTP request outside of the JSON content. Concatenated bytes can be interpreted by the model server depending on the header content. It is easy and quick to serialize the data from numpy/vectors and send jpeg/png encoded images.
Added Support for Intel® Data Center GPU Flex and Intel® Arc GPU
OpenVINO Model Server has now official support for Intel® Data Center GPU Flex and Intel® Arc GPU cards. Learn more about using discrete GPU devices.
C++ Sample Inference Client Applications using KServe API
New client code samples to demonstrate KServe API usage. These samples illustrate typical data formats and scenarios. Check out the samples.
Extended Python Client Samples using KServe API
Python client code samples have been extended to include new API features for both the gRPC and REST interfaces
Added integration with OpenVINO plugin for NVIDIA GPU
OpenVINO Model Server can now be used also with NVIDIA GPU cards. Follow those steps to build the Model Server from sources including NVIDIA plugin from openvino_contrib repo. Learn more about using NVIDIA plugin
Breaking changes
- CLI parameter has been changed to reflect interval time unit: custom_node_resources_cleaner_interval_seconds. Default value should be optimal for most use cases.
- Temporarily there is no support for HDDL/NCS plugins. Support for those will come in next release.
Deprecated functionality
- Plugin config parameters from OpenVINO API 1.0 – OpenVINO Model can be tuned using plugin config parameters. So far, the parameter names are defined by OpenVINO API 1.0. It is recommended to start using the parameter names defined in OpenVINO API 2.0. In this release old parameters are automatically translated to new substitutions. Check performance tuning guide and more info about the plugin parameters.
Bug fixes
- Improved performance for DAG pipelines executed on GPU accelerators
- The default number of performance tuning parameters was not calculated correctly inside docker containers with constrained CPU capacity. Now the number of optimal streams for THROUGHPUT mode will be set based on the bound CPU in the container.
- Fixes in unit tests raising sporadic false positive errors.
Other changes:
- Published binary package of OpenVINO Model Server which can be used in the deployments on baremetal hosts without Docker containers. See instructions for baremetal deployment.
- Updated software dependencies and container base images
You can use an OpenVINO Model Server public Docker image's based on Ubuntu via the following command:
docker pull openvino/model_server:2022.3
docker pull openvino/model_server:2022.3-gpu
or use provided binary packages.
OpenVINO™ Model Server 2022.2
The 2022.2 version is a major release with the new OpenVINO backend API (Application Programming Interface).
New features
KServe gRPC API
Beside Tensorflow Serving API, it is now possible to run calls to the OpenVINO Model Server using KServe API. The following gRPC methods are implemented: ModelInfer, ModelMetadata, ModelReady, ServerLive, ServerReady and ServerMetadata.
Inference execution supports the input both in the raw_input_contents format and InferTensorContents.
The same clients can be used to connect with the OpenVINO Model Server like with other KFServe compatible model servers. Check the samples using Triton client library in python.
KServe REST API – feature preview
Next to TensorFlow Serving REST API, we implemented also KFServe REST API. There are functional the following endpoints:
v2
v2/health/live
v2/health/ready
v2/models/{MODEL_NAME}[/versions/{MODEL_VERSION}]
v2/models/{MODEL_NAME}[/versions/{MODEL_VERSION}]/ready
v2/models/{MODEL_NAME}[/versions/{MODEL_VERSION}]/infer
Beside the standard input format as tensor_data
, there is implemented also the binary extension compatible with the Triton Inference Server.
That way the data could be sent in as arrays in json
or as encoded to jpeg
or png
content.
Check how to connect to KFServe in the samples using Triton client library in python.
Execution metrics – feature preview
OpenVINO Model Server can now expose metrics compatible with Prometheus format. Metrics can be enabled in the server configuration file or using a command line parameter.
The following metrics are now available:
ovms_streams
ovms_current_requests
ovms_requests_success
ovms_requests_fail
ovms_request_time_us
ovms_inference_time_us
ovms_wait_for_infer_req_time_us
ovms_infer_req_queue_size
ovms_infer_req_active
Metrics can be integrated with the Grafana reports or with horizontal autoscaler.
Learn more about using metrics.
Direct support for PaddlePaddle models
OpenVINO Model Server includes now PaddlePaddle model importer. It is possible to deploy models trained in PaddlePaddle framework directly into the models repository.
Check the demo how to deploy and use a segmentation model ocrnet-hrnet-w48-paddle in PaddlePaddle format.
Performance improvements in DAG execution
In several scenarios, the pipeline execution was improved to reduce data copy operation. That will be perceived as reduced latency and increased overall throughput.
Exemplary custom nodes are included in the OpenVINO Model Server public docker image.
Deploying the pipelines based on exemplary custom nodes . So far it was required to compile the custom node and mount into the container during the deployment. Now, those libraries are added to the public docker image. Demos including custom nodes, include now an option to use the precompiled version in the image or to build them from source. Check the demo of horizontal text detection pipeline
Breaking changes
Changed the sequence of starting REST/gRPC endpoints vs initial loading of the models.
With this version, the model server initiates the gRPC and REST endpoints (if enabled) before the models are loaded. Before that change, active network interface was acting as the readiness indicator. Now, the server readiness and models readiness can be checked using the dedicated endpoints according to the KFServe API:
v2/health/ready
v2/models/${MODEL_NAME}[/versions/${MODEL_VERSION}]/ready
It will make easier to monitor the state of models during the initialization phase.
Updated OpenCV version used in the model server to 4.6.0 version
That impacts the custom node compatibility. Any custom nodes using OpenCV for custom image transformation could be recompiled. Check the recommended process for building the custom nodes in the docker container in our examples
Bug Fixes:
- Minor fixes in logging
- Fixed configuring warning log level
- Fixes in documentation
- Security fixes
You can use an OpenVINO Model Server public Docker image based on Ubuntu via the following command:
docker pull openvino/model_server:2022.2
or
docker pull openvino/model_server:2022.2-gpu
OpenVINO™ Model Server 2022.1
The 2022.1 version is a major release with the new OpenVINO backend API (Application Programming Interface). It includes several new features and a few breaking changes.
New features
- Support for dynamic shape in the models
Allow configuring model inputs to accept range of input shape dimensions variable batch size. This enables sending predict requests with various image resolutions and batches. - Model cache for faster loading and initialization
The cached files make the Model Server initialization faster when performing subsequent model loading. Cache files can be reused within the same Model Server version, target device, hardware, model, model version, model shape and plugin config parameters. - Support for double precision
OVMS now supports two more additional precisions FP64 and I64. - Extended API for the Directed Acyclic Graph scheduler custom nodes to include initialization and cleanup steps
This enables additional use cases where you can initialize resources in the DAG loading step instead of during each predict request. This for example allows to avoid dynamic allocation during custom node execution. - Easier deployment of models with layout from training frameworks
If model has information about its layout this information is preserved in OVMS. OpenVINO Model Optimizer can be instructed to save information about model layout. - Arbitrary layout transpositions
Added support for handling any layout transformation when loading models. This will result in adding preprocessing step before inference. This is performed using--layout NCHW:NHWC
which informs the OVMS that natively accepts NHWC layout and we should add preprocessing step with transposition from NCHW to accept such inputs. - Support for models with batch size on arbitrary dimension
Batch size in layout can be now on any position in model. Previously OVMS batch size was accepted only on first dimension when changing model batch size.
Breaking changes
- Order of reshape and layout change operations during model initialization.
In previous OVMS versions, the order was: first do the reshape then apply layout change.
In this release OVMS handles order of operations for user, and it is required to specify expected final shape and expected transposition to be added.
If you wanted to change model with original shape: (1,3,200,200), layout: NCHW to handle different layout & resolution you had to set--shape "(1,3,224,224)" --layout NHWC
. Now both parameters should describe target values so with 2022.1 it should look like:--shape "(1,224,224,3)" --layout NHWC:NCHW
. - Layout parameter changes
Previously when configuring model with parameter--layout
administrator was not required to know what the underlying model layout is because OV by default used NCHW. Now when using parameter--layout NCHW
inform the OVMS that model is using layout NCHW – both model is using NCHW and accepting NCHW input. - Custom nodes code must include implementation of new API methods. It might be dummy implementation if not needed. Additionally, all previous API functions must include additional parameter void*.
- In the DAG pipelines configuration, demultiplexing with dynamic number of parallel operations is configurable with the parameter “dynamic_count” set to –1 beside the 0 so far. It will be more consistent with the common conventions used e.g., in model input shapes. Using 0 is now deprecated and support for this will be removed in following releases.
Other changes:
- Updated demo with question answering use case – BERT model demo with dynamic shape and variable length of the request content
- Rearranged structure of the demos and client code examples.
- Python client code examples both with tensorflow-server-api and ovmsclient library.
- Demos updated to use models with preserved layout and color format
- Custom nodes updated to use new API. Initialization step in model zoo custom node uses memory buffers initialization to speed up the execution.
Bug Fixes:
- Fixed issue with loading cloud stored models. Occasionally when downloading model it would not load properly.
- Fixes in documentation
- Security fixes
You can use an OpenVINO Model Server public Docker image based on Ubuntu via the following command:
docker pull openvino/model_server:2022.1
or
docker pull openvino/model_server:2022.1-gpu
OpenVINO™ Model Server 2021.4.2
The 2021.4.2 version is a hotfix release for the OpenVINO Model Server. It includes a few bug fixes and enhancements in the exemplary clients.
Bug fixes:
- Fixed an issue with inference execution on NCS stick which allows loading multiple models at the same time. Now, with the config mode, multiple models can be passed to NCS device via a parameter
--target_device MYRIAD
. - Documented docker container deployment with NCS stick without the privileged mode.
- Fixed handing of parameters including nested double quote
"
characters in the startup options in the docker container with an nginx proxy. It was impacting parameters like--plugin_config '{"CPU_THROUGHPUT_STREAMS":"1"}'
- Improved handling of OpenVINO plugin config parameters. Previously wrong type for a plugin parameter value didn’t return an error so it was easy to miss the fact that the parameter was ignored. Now the device plugin configuration will accept numerical values both with and without quotes.
--plugin_config '{"CPU_THROUGHPUT_STREAMS":"1"}'
and--plugin_config '{"CPU_THROUGHPUT_STREAMS":1}'
are fine now. Invalid format of the value will raise an error. - The parameters for changing the layout and shape with multiple inputs/outputs, will use the updated model tensor name as defined in the mapping_config.json file. It refers to a format like
{"input1":"NHWC","input2":"NHWC"}
- External contribution to a custom node model_zoo_intel_object_detection - added labels output in the Directed Acyclic Graph Scheduler custom node. Now, it includes in the output also the labels from an object detection model.
- Security related updates
Exemplary client’s improvements:
- OVMS Demo with Bert model – question answering python application
- C++ async client example – it demonstrates how to connect to the model server using C++ application, but it can also be a convenient tool to test OVMS performance with the execution concurrency.
- Golang client – a demonstration how to connect to OVMS via gRPC protocol from a Golang application
- Updated Optical Character Recognition pipeline example to use a combination of EAST-REST50 model with a text recognition model from OpenVINO Model Zoo
You can use an OpenVINO Model Server public Docker image based on Ubuntu via the following command:
docker pull openvino/model_server:2021.4.2
or
docker pull openvino/model_server:2021.4.2-gpu
OpenVINO™ Model Server 2021.4.1
The 2021.4.1 version of Model Server is primarily a hotfix release. It also includes a preview version of the simplified python client library and a sample client written in C++. We added also a version of the model server docker image based on Ubuntu 20.04. The public docker image in DockerHub is using now the Ubuntu 20.04 base OS. The model server image based on CentOS 7 will be discontinued starting from next release.
Bug Fixes:
- Removed limitation in the DAG configuration which required the pipeline input to be connected to at least one neural network model while using the binary input format. Now the input can be connected also exclusively to a custom node. An example of such use case is documented in ovms_onnx_example.
- Removed an invalid error message in the server logs while loading models from the google cloud storage.
- Fixed a very rare race condition preventing detection of updates in the configuration file.
- Improvements in the error messages reporting an invalid DAG pipeline configuration with unmatched data shape between nodes.
- Corrected reported model state in the model states queries in the loading error condition. When the model cannot be loaded, it will now report status
Loading>Error
instead ofEnd>OK
. - The model server was ignoring incorrect parameters in the configuration file. It was typically a situation with a spelling mistake for a valid parameter. Now an error will be raised when an invalid parameter is defined.
- Corrected issue related to a scenario with demultiplexed output connected both to a custom node and a neural network model (DL node).
Python client library - the lightweight client library provides simplified mechanism to communicate with OVMS and TensorFlow Serving. Contrary to tensorflow-server-api, it does not include Tensorflow as a dependency, which reduces its size dramatically. It also has simplified API which allows sending the prediction requests with just few commands. Currently gRPC protocol is included. REST API is to be added. Learn more in client lib documentation.
C++ client example - client code example compatible with OVMS and TensorFlow Serving. It can run the predict requests both in a format of jpeg/png images or as arrays converted to tensor_content format. It includes a receipt for building it using bazel and a dockerfile. Learn more in example documentation.
You can use an OpenVINO Model Server public Docker image based on Ubuntu via the following command:
docker pull openvino/model_server:2021.4.1
or
docker pull openvino/model_server:2021.4.1-gpu
OpenVINO™ Model Server 2021.4
The 2021.4 release of OpenVINO™ Model Server includes the following new features and bug fixes:
New Features:
-
Binary input data - ability to send inference requests using data in a compressed format like jpeg or png – significantly reducing communication bandwidth. There is a noticeable performance improvement, especially with the REST API prediction calls and image data. For more details, see the documentation.
-
Dynamic batch size without model reloading – it is now possible to run inference with arbitrary batch sizes using input demultiplexing and splitting execution into parallel streams. This feature enables inference execution with OpenVINO Inference Engine without the side effect of changing the batch size for sequential requests and reloading models at runtime. For more details, see the documentation.
-
Practical examples of custom nodes – new or updated custom nodes: model zoo object detection, Optical Character Recognition and image transformation. These custom nodes can be used in a range of applications like vehicle object detection combined with recognition or OCR pipelines. Learn more about DAG Scheduler and custom nodes in the documentation.
-
Change model input and output layouts at runtime – it is now possible to change the model layout at runtime to NHWC. Source images are typically in HWC layout and such layout is used to image transformation libraries. Using the same layout in the model simplifies linking custom nodes with image transformations and avoids data transposing. It also reduces the load on clients and the overall latency for inference requests. Learn more
Bug Fixes:
-
Access to public S3 buckets without authentication was not functional. Now models in public S3 buckets can be loaded without credentials.
-
Configuration Reload API calls did not update the models when the Model Server was started with missing model repository.
-
Configuration file validation accepted illegal shape configurations which is corrected now and report a proper error log.
-
ONNX models with dynamic shapes could not be loaded even after defining the shape in the configuration file.
-
DAG Scheduler pipelines could not be created with connections between nodes one having dynamic and one having static shape.
-
Custom loader did not detect and apply configuration changes correctly at runtime.
-
Unhandled exception while loading unsupported models on HDDL devices.
OpenVINO™ Toolkit Operator for OpenShift
The OpenVINO™ Toolkit Operator for OpenShift 0.2.0 is included in the 2021.4 release. It has been renamed and has the following enhancements compared to previous OpenVINO™ Model Server Operator 0.1.0 released with 2021.3:
-
The Custom Resource for managing the instances of OpenVINO™ Model Server is renamed from Ovms to ModelServer.
-
ModelServer resources can now manage additional parameters: annotations, batch_size, shape, model_version_policy, file_system_poll_wait_seconds, stateful, node_selector, and layout. For a list of all parameters, see the documentation.
-
The new Operator integrates OpenVINO™ Toolkit with OpenShift Data Science —a managed service for data scientists and AI developers offered by Red Hat. The Operator automatically builds a Notebook image in OpenShift which integrates OpenVINO™ Toolkit's developer tools and tutorials with the JupyterHub spawner.
-
Operator 0.2.0 is currently available for OpenShift only. Updates to the Kubernetes Operator will be included in a future release.
You can use an OpenVINO™ Model Server public Docker image based on CentOS via the following command:
docker pull openvino/model_server:2021.4
or
docker pull openvino/model_server:2021.4-gpu
Deprecation notice
Starting with 2022.1 OpenVINO™ Model Server release docker images will be based on Ubuntu instead of CentOS.
OpenVINO Model Server 2021.3
OpenVINO™ Model Server
This is the third release of OVMS in C++ implementation. It contains as a backend OpenVINO Inference Engine in the same version - 2021.3.
New capabilities and enhancements
- Custom Node support for Directed Acyclic Graph Scheduler. Custom nodes in OpenVINO Model Server simplifies linking deep learning models into a complete pipeline even if the inputs and output of the sequential models does not fit. In many cases, output of one model can not be directly passed to another one. The data might need to be analysed, filtered or converted to different format. Those operations can not be easily implemented in AI frameworks or are simply not supported. Custom node addresses this challenge. They allow employing a dynamic library developed in C++ or C to perform arbitrary data transformations.
- DAG demultiplexing - Directed Acyclic Graph Scheduler allows creating pipelines with Node output demultiplexing into separate sub outputs and branch pipeline execution. It can improve execution performance and address scenarios where any number of intermediate batches produced by custom nodes can be processed separately and collected at any graph stage.
- Exemplary custom node for OCR pipeline - A use case scenario for custom node and execution demuliplexing has been demonstrated in an OCR pipeline. It combines east-resnet50 model with CRNN model for a complete text detection and text recognition. This custom node analyses the response of east-resnet50 model. Based on the inference results and the original image, it generates a list of detected boxes for text recognition. Each image in the output will be resized to the predefined target size to fit the next inference model in the DAG pipeline (CRNN) .
- Support for stateful models - A stateful model recognizes dependencies between consecutive inference requests. It maintains state between inference requests so that next inference depends on the results of previous ones. OVMS allows now submitting inference requests in a context of a specific sequence. OVMS stored and model state and response the prediction results based on the history of requests from the client.
- Control API - extended REST API to provide functionality of triggering OVMS configuration updates. Endpoint config/reload initiate applying configuration changes and models reloading. It ensures changes in configuration are deployed in a specific time and also gives confirmation about reload operation status. Endpoint /config reports all served models and their versions. It simplifies usage model from the client side and connection troubleshooting.
- Helm chart enhancements - added multiple configuration options for deployment with new scenarios: new model storage classes, kubernetes resource restrictions, security context. Fixed defected with big scale deployments.
- Kubernetes Operator - enabled OVMS deployments using Kubernetes Operator for OVMS. This offering can be used to simplify management of OVMS services at scale in Openshift and in open source kubernetes. This offering is published in operatorhub
You can use an OpenVINO Model Server public Docker image based on centos* via the following command:
docker pull openvino/model_server:2021.3
or
docker pull openvino/model_server:2021.3-gpu
OpenVINO Model Server 2021.2.1
OpenVINO Model Server 2021.2.1 is a hot fix release without any new features and functionality changes. It contains OpenVINO Inference Engine in version - 2021.2.
It addresses the following bugs:
- Incorrect version management for corrupted or invalid models – when the model files were invalid or incomplete, OVMS could serve incorrect version or stop serving all model versions. Now, versions with invalid model files will be ignored. Model version policy will apply only to valid models.
- Sporadic OVMS crash after online update of the configuration file under very heavy load from DAG prediction calls.
- Incorrect response from GetModelMetadata after online model configuration change
- Incorrect parsing of OVMS parameters in quotes in the docker image with nginx reverse proxy for clients mTLS authorization
- Allowed configuration of multiple pipelines with identical name – now it is prevented in configuration validation
- Minor issues in documentation
You can use an OpenVINO Model Server public Docker image based on CentOS* via the following command:
docker pull openvino/model_server:2021.2.1
or
docker pull openvino/model_server:2021.2.1-gpu
OpenVINO Model Server 2021.2
This is the second release of OVMS in C++ implementation. It includes several new features, enhancements and bug fixes. It contains as a backend OpenVINO Inference Engine in the same version - 2021.2.
New capabilities and enhancements
- Directed Acyclic Graph (DAG) scheduler – (formerly
models ensemble
) this feature was first available as a preview in 2021.1. It is now officially supported, making it possible to define inference pipelines composed of multiple interconnected models that respond to a single prediction request. In this release we are adding support for remaining API calls which were not supported for DAGs in the preview, specificallyGetModelStatus
andGetModelMetadata
.GetModelStatus
returns the status of the complete pipeline while GetModelMetadata returns the pipeline inputs and outputs parameters. The new 2021.2 release has improved DAG config validation. - Direct import of ONNX models – it is now possible to serve ONNX models without converting to Intermediate Representation (IR) format. This helps simplify deployments using ONNX models and the PyTorch training framework.
- Custom loaders and integration with OpenVINO™ Security Add-on – it is now possible to define a custom library to handle model loading operations – including additional steps related to model decryption and license verification. Review the documentation of the Security Add-on component to learn about controlled access to the models.
- Traffic Encryption – new deployment recipe for client authorization via mTLS certificates and traffic encryption by integrating with NGINX reverse proxy in a Docker container.
- Remote Model Caching from cloud storage – models stored in Google Cloud Storage (GCS), Amazon S3 and Azure blob will no longer be downloaded multiple times after configuration changes that require model reloading. Cached model(s) will be used during the model reload operation. When a served model is changed, only the corresponding new version folder will be added to the model storage.
- updated versions of several third-party dependencies
Fixed bugs
- Sporadic short unavailability of the default version when the model is switching to newer one
- REST API not working with rest_workers=1 - there will be clear error message about invalid value. By default the number of REST worker threads will be adjusted automatically based on the CPUs
- Prevented service crash when shape parameter is out of integer range
Known issues
- version upgrade might fail when new model files are corrupted but older versions might be unloaded according to model version policy
- ovms might sporadically fail under very heavy load on DAG execution during online update of pipeline models configuration. Predictions for individual models are not impacted.
You can use an OpenVINO Model Server public Docker image based on centos* via the following command:
docker pull openvino/model_server:2021.2
or
docker pull openvino/model_server:2021.2-gpu
OpenVINO Model Server 2021.1
This is a major release of OpenVINO Model Server. It is a completely rewritten implementation of the serving component. Upgrade from Python-based version (2020.4) to C++ implementation (2021.1) should be mostly transparent. There are no changes required on the client side. Exposed API is unchanged but some configuration settings and deployment methods might be slightly adjusted.
Key New Features and Enhancements
- Much higher scalability in a single service instance. You can now utilize the full capacity of the available hardware. Expect linear scalability when introducing additional resources while avoiding any bottleneck on the frontend.
- Lower latency between the client and the server. This is especially noticeable with high performance accelerators or CPUs.
- Reduced footprint. By switching to C++ and reducing dependencies, the Docker image is reduced to ~400MB (for CPU, NCS and HDDL support) and ~800MB (for the image including also iGPU support).
- Reduced RAM usage. Thanks to reduced number of external software dependencies, OpenVINO Model Server allocates less memory on start up.
- Easier deployment on bare-metal or inside a Docker container.
- Support for online model updates.The server monitors configuration file changes and reloads models as needed without restarting the service.
- Model ensemble (preview). Connect multiple models to deploy complex processing solutions and reduce overhead of sending data back and forth.
- Azure Blob Storage support. From now on you can host your models in Azure Blob Storage containers.
- Updated helm chart for easy deployment in Kubernetes
Changes in version 2021.1
Moving from 2020.4 to 2021.1 introduces a few changes and optimizations which primarily impact the server deployment and configuration process. These changes are documented below.
- Docker Container Entrypoint
To simplify deployment with containers, Docker imageentrypoint
was added. Now the container startup requires only parameters specific to the Model Server executable:
Old command:
docker run -d -v $(pwd)/model:/models/my_model/ -e LOG_LEVEL=DEBUG -p 9000:9000 openvino/model_server /ie-serving-py/start_server.sh ie_serving model --model_path /models/face-detection --model_name my_model --port 9000 --shape auto
New command:
docker run -d -v $(pwd)/model:/models/my_model/ -p 9000:9000 openvino/model_server --model_path /models/my_model --model_name my_model --port 9000 --shape auto --log_level DEBUG
- Simplified Command Line Parameters
Subcommandsmodel
andconfig
are no longer used. Single-model mode or multi-model mode of serving is determined based on whether --config_path or --model_name is defined. --config_path or --model_name are exclusive. - Changed default THROUGHPUT_STREAMS settings for the CPU and GPU device plugin
In python implementation, the default configuration was optimized for minimal latency results with a single stream of inference request. In version 2021.1, the default configuration for the server concurrency CPU_THROUGHPUT_STREAMS and GPU_THROUGHPUT_STREAMS are calculated automatically based on the available resources. It ensure both low latency and efficient parallel processing. If you need to serve the models only for a single client on high performance systems, set a parameter like below:
--plugin_config '{"CPU_THROUGHPUT_STREAMS":"1"}'
- Log Level and Log File Path
Instead of environment variables LOG_LEVEL and LOG_PATH, log level and path are now defined in command line parameters to simplify configuration.
--log_level DEBUG/INFO(default)/ERROR
- grpc_workers Parameter Meaning
In the Python implementation (2020.4 and below) this parameter defined the number of frontend threads. In the C++ implementation (2021.1 and above) this defines the number of internal gRPC server objects to increase the maximum bandwidth capacity. The default value of 1 should be sufficient for most scenarios. Consider tuning it if you expect very high load from multiple parallel clients. - Model Data Type Conversion
In the Python implementation (2020.4 and below) the input tensors of data type different than expected by the model were automatically converted to match required data type. In some cases, such conversion impacted the overall performance of inference request. In the version 2021.1, the user input data type must be the same as the model input data type. The client receives an error indicating incorrect input data precision, which gives immediate feedback to correct the format. - Proxy Settings
no_proxy environment variable is not used with the cloud storage for models. Thehttp_proxy
andhttps_proxy
settings are common for all remote models deployed in OpenVINO Model Server. In case you need to deploy both models stored behind the proxy and direct, run two instances of the model server.
Refer to troubleshooting guide to learn about known issues and workarounds. - Default Docker security context
By default OpenVINO Model Server process starts inside the docker container in the context of ovms account with uid 5000. It was root context in the previous versions. The change is enforcing the best practice of minimal required permissions. In case you need to change the security context, use–user
flag indocker run
command.
Note: Git history of C++ development is stored on a main
branch (new default). Python implementation history is preserved on a master
branch.
You can use an OpenVINO Model Server public Docker image based on centos* via the following command:
docker pull openvino/model_server:2021.1
or
docker pull openvino/model_server:2021.1-gpu