diff --git a/docs/notebooks/001-hello-world-with-output.rst b/docs/notebooks/001-hello-world-with-output.rst index 65dd7e75025ea1..797cc193e53221 100644 --- a/docs/notebooks/001-hello-world-with-output.rst +++ b/docs/notebooks/001-hello-world-with-output.rst @@ -1,7 +1,7 @@ Hello Image Classification ========================== -.. _top: + This basic introduction to OpenVINO™ shows how to do inference with an image classification model. @@ -15,6 +15,10 @@ created, refer to the `TensorFlow to OpenVINO <101-tensorflow-classification-to-openvino-with-output.html>`__ tutorial. + + +.. _top: + **Table of contents**: - `Imports <#imports>`__ diff --git a/docs/notebooks/003-hello-segmentation-with-output.rst b/docs/notebooks/003-hello-segmentation-with-output.rst index 0bf7c3cb1e40d5..19c71e8924d361 100644 --- a/docs/notebooks/003-hello-segmentation-with-output.rst +++ b/docs/notebooks/003-hello-segmentation-with-output.rst @@ -1,7 +1,7 @@ Hello Image Segmentation ======================== -.. _top: + A very basic introduction to using segmentation models with OpenVINO™. @@ -12,6 +12,10 @@ Zoo `__ is used. ADAS stands for Advanced Driver Assistance Services. The model recognizes four classes: background, road, curb and mark. + + +.. _top: + **Table of contents**: - `Imports <#imports>`__ diff --git a/docs/notebooks/004-hello-detection-with-output.rst b/docs/notebooks/004-hello-detection-with-output.rst index 5eefc6b0fc2ab7..b5b1a183e6072c 100644 --- a/docs/notebooks/004-hello-detection-with-output.rst +++ b/docs/notebooks/004-hello-detection-with-output.rst @@ -1,7 +1,7 @@ Hello Object Detection ====================== -.. _top: + A very basic introduction to using object detection models with OpenVINO™. @@ -18,6 +18,10 @@ corner, ``(x_max, y_max)`` are the coordinates of the bottom right bounding box corner and ``conf`` is the confidence for the predicted class. + + +.. _top: + **Table of contents**: - `Imports <#imports>`__ diff --git a/docs/notebooks/101-tensorflow-classification-to-openvino-with-output.rst b/docs/notebooks/101-tensorflow-classification-to-openvino-with-output.rst index 20921ab81cc606..50b81fc51eade5 100644 --- a/docs/notebooks/101-tensorflow-classification-to-openvino-with-output.rst +++ b/docs/notebooks/101-tensorflow-classification-to-openvino-with-output.rst @@ -1,7 +1,7 @@ Convert a TensorFlow Model to OpenVINO™ ======================================= -.. _top: + | This short tutorial shows how to convert a TensorFlow `MobileNetV3 `__ @@ -13,7 +13,11 @@ Convert a TensorFlow Model to OpenVINO™ Runtime `__ and do inference with a sample image. -| **Table of contents**: + + +| .. _top: + +**Table of contents**: - `Imports <#imports>`__ - `Settings <#settings>`__ diff --git a/docs/notebooks/102-pytorch-onnx-to-openvino-with-output.rst b/docs/notebooks/102-pytorch-onnx-to-openvino-with-output.rst index d5075dd2c3a45f..ac8ce1e7bdf452 100644 --- a/docs/notebooks/102-pytorch-onnx-to-openvino-with-output.rst +++ b/docs/notebooks/102-pytorch-onnx-to-openvino-with-output.rst @@ -1,7 +1,7 @@ Convert a PyTorch Model to ONNX and OpenVINO™ IR ================================================ -.. _top: + This tutorial demonstrates step-by-step instructions on how to do inference on a PyTorch semantic segmentation model, using OpenVINO @@ -35,6 +35,10 @@ plant, sheep, sofa, train, tv monitor** More information about the model is available in the `torchvision documentation `__ + + +.. _top: + **Table of contents**: - `Preparation <#preparation>`__ diff --git a/docs/notebooks/102-pytorch-to-openvino-with-output.rst b/docs/notebooks/102-pytorch-to-openvino-with-output.rst index 11fbc0a95bf830..d2b0b57549bff5 100644 --- a/docs/notebooks/102-pytorch-to-openvino-with-output.rst +++ b/docs/notebooks/102-pytorch-to-openvino-with-output.rst @@ -1,7 +1,7 @@ Convert a PyTorch Model to OpenVINO™ IR ======================================= -.. _top: + This tutorial demonstrates step-by-step instructions on how to do inference on a PyTorch classification model using OpenVINO Runtime. @@ -31,6 +31,10 @@ but elevated to the design space level. The RegNet design space provides simple and fast networks that work well across a wide range of flop regimes. + + +.. _top: + **Table of contents**: - `Prerequisites <#prerequisites>`__ diff --git a/docs/notebooks/103-paddle-to-openvino-classification-with-output.rst b/docs/notebooks/103-paddle-to-openvino-classification-with-output.rst index a430a89c935940..5c8a9fefd2b88d 100644 --- a/docs/notebooks/103-paddle-to-openvino-classification-with-output.rst +++ b/docs/notebooks/103-paddle-to-openvino-classification-with-output.rst @@ -1,7 +1,7 @@ Convert a PaddlePaddle Model to OpenVINO™ IR ============================================ -.. _top: + This notebook shows how to convert a MobileNetV3 model from `PaddleHub `__, pre-trained @@ -16,6 +16,10 @@ IR model. Source of the `model `__. + + +.. _top: + **Table of contents**: - `Preparation <#preparation>`__ diff --git a/docs/notebooks/104-model-tools-with-output.rst b/docs/notebooks/104-model-tools-with-output.rst index ef7c0b21a36b8c..47cd5fd7e26980 100644 --- a/docs/notebooks/104-model-tools-with-output.rst +++ b/docs/notebooks/104-model-tools-with-output.rst @@ -1,13 +1,15 @@ Working with Open Model Zoo Models ================================== -.. _top: + This tutorial shows how to download a model from `Open Model Zoo `__, convert it to OpenVINO™ IR format, show information about the model, and benchmark the model. +.. _top: + **Table of contents**: - `OpenVINO and Open Model Zoo Tools <#openvino-and-open-model-zoo-tools>`__ diff --git a/docs/notebooks/105-language-quantize-bert-with-output.rst b/docs/notebooks/105-language-quantize-bert-with-output.rst index 772e22f0cce4d5..61c8d152e6bcef 100644 --- a/docs/notebooks/105-language-quantize-bert-with-output.rst +++ b/docs/notebooks/105-language-quantize-bert-with-output.rst @@ -1,7 +1,7 @@ Quantize NLP models with Post-Training Quantization ​in NNCF ============================================================ -.. _top: + This tutorial demonstrates how to apply ``INT8`` quantization to the Natural Language Processing model known as @@ -24,6 +24,10 @@ and datasets. It consists of the following steps: - Compare the performance of the original, converted and quantized models. + + +.. _top: + **Table of contents**: - `Imports <#imports>`__ diff --git a/docs/notebooks/106-auto-device-with-output.rst b/docs/notebooks/106-auto-device-with-output.rst index ac1e5303ac0f6e..b1e37e02f7a376 100644 --- a/docs/notebooks/106-auto-device-with-output.rst +++ b/docs/notebooks/106-auto-device-with-output.rst @@ -1,8 +1,6 @@ Automatic Device Selection with OpenVINO™ ========================================= -.. _top: - The `Auto device `__ (or AUTO in short) selects the most suitable device for inference by @@ -32,6 +30,10 @@ first inference. auto + + +.. _top: + **Table of contents**: - `Import modules and create Core <#import-modules-and-create-core>`__ diff --git a/docs/notebooks/107-speech-recognition-quantization-data2vec-with-output.rst b/docs/notebooks/107-speech-recognition-quantization-data2vec-with-output.rst index 30edff38325fec..313abe78024581 100644 --- a/docs/notebooks/107-speech-recognition-quantization-data2vec-with-output.rst +++ b/docs/notebooks/107-speech-recognition-quantization-data2vec-with-output.rst @@ -1,8 +1,6 @@ Quantize Speech Recognition Models using NNCF PTQ API ===================================================== -.. _top: - This tutorial demonstrates how to use the NNCF (Neural Network Compression Framework) 8-bit quantization in post-training mode (without the fine-tuning pipeline) to optimize the speech recognition model, @@ -21,6 +19,10 @@ steps: - Compare performance of the original and quantized models. - Compare Accuracy of the Original and Quantized Models. + + +.. _top: + **Table of contents**: - `Download and prepare model <#download-and-prepare-model>`__ diff --git a/docs/notebooks/108-gpu-device-with-output.rst b/docs/notebooks/108-gpu-device-with-output.rst index fc81855e8c76bd..fc236c92b8331e 100644 --- a/docs/notebooks/108-gpu-device-with-output.rst +++ b/docs/notebooks/108-gpu-device-with-output.rst @@ -1,6 +1,8 @@ Working with GPUs in OpenVINO™ ============================== + + .. _top: **Table of contents**: diff --git a/docs/notebooks/109-latency-tricks-with-output.rst b/docs/notebooks/109-latency-tricks-with-output.rst index 2d3e64bb15f37e..af706167522e3b 100644 --- a/docs/notebooks/109-latency-tricks-with-output.rst +++ b/docs/notebooks/109-latency-tricks-with-output.rst @@ -1,8 +1,6 @@ Performance tricks in OpenVINO for latency mode =============================================== -.. _top: - The goal of this notebook is to provide a step-by-step tutorial for improving performance for inferencing in a latency mode. Low latency is especially desired in real-time applications when the results are needed @@ -51,6 +49,10 @@ optimize performance on OpenVINO IR files in A similar notebook focused on the throughput mode is available `here <109-throughput-tricks-with-output.html>`__. + + +.. _top: + **Table of contents**: - `Data <#data>`__ diff --git a/docs/notebooks/109-throughput-tricks-with-output.rst b/docs/notebooks/109-throughput-tricks-with-output.rst index b742d7b30e5535..523bb307ed2cf3 100644 --- a/docs/notebooks/109-throughput-tricks-with-output.rst +++ b/docs/notebooks/109-throughput-tricks-with-output.rst @@ -1,7 +1,7 @@ Performance tricks in OpenVINO for throughput mode ================================================== -.. _top: + The goal of this notebook is to provide a step-by-step tutorial for improving performance for inferencing in a throughput mode. High @@ -46,6 +46,10 @@ optimize performance on OpenVINO IR files in A similar notebook focused on the latency mode is available `here <109-latency-tricks-with-output.html>`__. + + +.. _top: + **Table of contents**: - `Data <#data>`__ diff --git a/docs/notebooks/110-ct-scan-live-inference-with-output.rst b/docs/notebooks/110-ct-scan-live-inference-with-output.rst index e55cb96fc34852..9ae34d9db77f08 100644 --- a/docs/notebooks/110-ct-scan-live-inference-with-output.rst +++ b/docs/notebooks/110-ct-scan-live-inference-with-output.rst @@ -1,8 +1,6 @@ Live Inference and Benchmark CT-scan Data with OpenVINO™ ======================================================== -.. _top: - Kidney Segmentation with PyTorch Lightning and OpenVINO™ - Part 4 ----------------------------------------------------------------- @@ -30,6 +28,10 @@ notebook. For demonstration purposes, this tutorial will download one converted CT scan to use for inference. + + +.. _top: + **Table of contents**: - `Imports <#imports>`__ diff --git a/docs/notebooks/110-ct-segmentation-quantize-nncf-with-output.rst b/docs/notebooks/110-ct-segmentation-quantize-nncf-with-output.rst index a1b277faa9f2a9..898fa83b906f45 100644 --- a/docs/notebooks/110-ct-segmentation-quantize-nncf-with-output.rst +++ b/docs/notebooks/110-ct-segmentation-quantize-nncf-with-output.rst @@ -1,8 +1,6 @@ Quantize a Segmentation Model and Show Live Inference ===================================================== -.. _top: - Kidney Segmentation with PyTorch Lightning and OpenVINO™ - Part 3 ----------------------------------------------------------------- @@ -55,6 +53,10 @@ demonstration purposes, this tutorial will download one converted CT scan and use that scan for quantization and inference. For production purposes, use a representative dataset for quantizing the model. + + +.. _top: + **Table of contents**: - `Imports <#imports>`__ diff --git a/docs/notebooks/111-yolov5-quantization-migration-with-output.rst b/docs/notebooks/111-yolov5-quantization-migration-with-output.rst index 83440d3943312e..b297f3425dac14 100644 --- a/docs/notebooks/111-yolov5-quantization-migration-with-output.rst +++ b/docs/notebooks/111-yolov5-quantization-migration-with-output.rst @@ -1,8 +1,6 @@ Migrate quantization from POT API to NNCF API ============================================= -.. _top: - This tutorial demonstrates how to migrate quantization pipeline written using the OpenVINO `Post-Training Optimization Tool (POT) `__ to `NNCF Post-Training Quantization API `__. @@ -23,6 +21,9 @@ The tutorial consists from the following parts: 7. Compare performance FP32 and INT8 models + +.. _top: + **Table of contents**: - `Preparation <#preparation>`__ diff --git a/docs/notebooks/112-pytorch-post-training-quantization-nncf-with-output.rst b/docs/notebooks/112-pytorch-post-training-quantization-nncf-with-output.rst index 7aa3722b2cb442..4b054d59389fe6 100644 --- a/docs/notebooks/112-pytorch-post-training-quantization-nncf-with-output.rst +++ b/docs/notebooks/112-pytorch-post-training-quantization-nncf-with-output.rst @@ -1,8 +1,6 @@ Post-Training Quantization of PyTorch models with NNCF ====================================================== -.. _top: - The goal of this tutorial is to demonstrate how to use the NNCF (Neural Network Compression Framework) 8-bit quantization in post-training mode (without the fine-tuning pipeline) to optimize a PyTorch model for the @@ -27,6 +25,9 @@ quantization, not demanding the fine-tuning of the model. notebook. + +.. _top: + **Table of contents**: - `Preparations <#preparations>`__ diff --git a/docs/notebooks/113-image-classification-quantization-with-output.rst b/docs/notebooks/113-image-classification-quantization-with-output.rst index 25fc172a7e6f85..95f2f7695c1078 100644 --- a/docs/notebooks/113-image-classification-quantization-with-output.rst +++ b/docs/notebooks/113-image-classification-quantization-with-output.rst @@ -1,7 +1,7 @@ Quantization of Image Classification Models =========================================== -.. _top: + This tutorial demonstrates how to apply ``INT8`` quantization to Image Classification model using @@ -21,6 +21,8 @@ This tutorial consists of the following steps: - Compare performance of the original and quantized models. - Compare results on one picture. +.. _top: + **Table of contents**: - `Prepare the Model <#prepare-the-model>`__ diff --git a/docs/notebooks/115-async-api-with-output.rst b/docs/notebooks/115-async-api-with-output.rst index bf666cedae1d8f..f8b9eb906a5a50 100644 --- a/docs/notebooks/115-async-api-with-output.rst +++ b/docs/notebooks/115-async-api-with-output.rst @@ -1,7 +1,7 @@ Asynchronous Inference with OpenVINO™ ===================================== -.. _top: + This notebook demonstrates how to use the `Async API `__ @@ -14,6 +14,8 @@ in parallel (for example, populating inputs or scheduling other requests) rather than wait for the current inference to complete first. +.. _top: + **Table of contents**: - `Imports <#imports>`__ diff --git a/docs/notebooks/116-sparsity-optimization-with-output.rst b/docs/notebooks/116-sparsity-optimization-with-output.rst index cf3068f712710c..c5ccc6437e9a41 100644 --- a/docs/notebooks/116-sparsity-optimization-with-output.rst +++ b/docs/notebooks/116-sparsity-optimization-with-output.rst @@ -1,7 +1,7 @@ Accelerate Inference of Sparse Transformer Models with OpenVINO™ and 4th Gen Intel® Xeon® Scalable Processors ============================================================================================================= -.. _top: + This tutorial demonstrates how to improve performance of sparse Transformer models with `OpenVINO `__ on 4th @@ -21,6 +21,8 @@ consists of the following steps: integration with Hugging Face Optimum. - Compare sparse 8-bit vs. dense 8-bit inference performance. +.. _top: + **Table of contents**: - `Prerequisites <#prerequisites>`__ diff --git a/docs/notebooks/117-model-server-with-output.rst b/docs/notebooks/117-model-server-with-output.rst index edc2562eb7c27c..272e3b3bdca60d 100644 --- a/docs/notebooks/117-model-server-with-output.rst +++ b/docs/notebooks/117-model-server-with-output.rst @@ -1,7 +1,7 @@ Hello Model Server ================== -.. _top: + Introduction to OpenVINO™ Model Server (OVMS). @@ -33,6 +33,8 @@ deployment: |ovms_diagram| +.. _top: + **Table of contents**: - `Serving with OpenVINO Model Server <#serving-with-openvino-model-server1>`__ diff --git a/docs/notebooks/118-optimize-preprocessing-with-output.rst b/docs/notebooks/118-optimize-preprocessing-with-output.rst index 87e68a00acb3ca..cebce914098bd5 100644 --- a/docs/notebooks/118-optimize-preprocessing-with-output.rst +++ b/docs/notebooks/118-optimize-preprocessing-with-output.rst @@ -1,7 +1,7 @@ Optimize Preprocessing ====================== -.. _top: + When input data does not fit the model input tensor perfectly, additional operations/steps are needed to transform the data to the @@ -27,6 +27,8 @@ This tutorial include following steps: - Comparing results on one picture. - Comparing performance. +.. _top: + **Table of contents**: - `Settings <#settings>`__ diff --git a/docs/notebooks/119-tflite-to-openvino-with-output.rst b/docs/notebooks/119-tflite-to-openvino-with-output.rst index 726ec4fb5f1143..07d330269f823d 100644 --- a/docs/notebooks/119-tflite-to-openvino-with-output.rst +++ b/docs/notebooks/119-tflite-to-openvino-with-output.rst @@ -1,7 +1,7 @@ Convert a Tensorflow Lite Model to OpenVINO™ ============================================ -.. _top: + `TensorFlow Lite `__, often referred to as TFLite, is an open source library developed for deploying @@ -17,6 +17,8 @@ After creating the OpenVINO IR, load the model in `OpenVINO Runtime `__ and do inference with a sample image. +.. _top: + **Table of contents**: - `Preparation <#preparation>`__ diff --git a/docs/notebooks/120-tensorflow-object-detection-to-openvino-with-output.rst b/docs/notebooks/120-tensorflow-object-detection-to-openvino-with-output.rst index 40b088044cfc5f..a66dcefb6a4884 100644 --- a/docs/notebooks/120-tensorflow-object-detection-to-openvino-with-output.rst +++ b/docs/notebooks/120-tensorflow-object-detection-to-openvino-with-output.rst @@ -1,7 +1,7 @@ Convert a TensorFlow Object Detection Model to OpenVINO™ ======================================================== -.. _top: + `TensorFlow `__, or TF for short, is an open-source framework for machine learning. @@ -26,6 +26,8 @@ After creating the OpenVINO IR, load the model in `OpenVINO Runtime `__ and do inference with a sample image. +.. _top: + **Table of contents**: - `Prerequisites <#prerequisites>`__ diff --git a/docs/notebooks/121-convert-to-openvino-with-output.rst b/docs/notebooks/121-convert-to-openvino-with-output.rst index 2a83c5fdc7c342..13bca81bd9e271 100644 --- a/docs/notebooks/121-convert-to-openvino-with-output.rst +++ b/docs/notebooks/121-convert-to-openvino-with-output.rst @@ -4,6 +4,8 @@ OpenVINO™ model conversion API This notebook shows how to convert a model from original framework format to OpenVINO Intermediate Representation (IR). +.. _top: + **Table of contents**: - `OpenVINO IR format <#openvino-ir-format>`__ diff --git a/docs/notebooks/122-speech-recognition-quantization-wav2vec2-with-output.rst b/docs/notebooks/122-speech-recognition-quantization-wav2vec2-with-output.rst new file mode 100644 index 00000000000000..4db1ac32fe921f --- /dev/null +++ b/docs/notebooks/122-speech-recognition-quantization-wav2vec2-with-output.rst @@ -0,0 +1,309 @@ +Quantize Speech Recognition Models with accuracy control using NNCF PTQ API +=========================================================================== + + + +This tutorial demonstrates how to apply ``INT8`` quantization with +accuracy control to the speech recognition model, known as +`Wav2Vec2 `__, +using the NNCF (Neural Network Compression Framework) 8-bit quantization +with accuracy control in post-training mode (without the fine-tuning +pipeline). This notebook uses a fine-tuned +`Wav2Vec2-Base-960h `__ +`PyTorch `__ model trained on the `LibriSpeech ASR +corpus `__. The tutorial is designed to be +extendable to custom models and datasets. It consists of the following +steps: + +- Download and prepare the Wav2Vec2 model and LibriSpeech dataset. +- Define data loading and accuracy validation functionality. +- Model quantization with accuracy control. +- Compare Accuracy of original PyTorch model, OpenVINO FP16 and INT8 + models. +- Compare performance of the original and quantized models. + +The advanced quantization flow allows to apply 8-bit quantization to the +model with control of accuracy metric. This is achieved by keeping the +most impactful operations within the model in the original precision. +The flow is based on the `Basic 8-bit +quantization `__ +and has the following differences: + +- Besides the calibration dataset, a validation dataset is required to + compute the accuracy metric. Both datasets can refer to the same data + in the simplest case. +- Validation function, used to compute accuracy metric is required. It + can be a function that is already available in the source framework + or a custom function. +- Since accuracy validation is run several times during the + quantization process, quantization with accuracy control can take + more time than the Basic 8-bit quantization flow. +- The resulted model can provide smaller performance improvement than + the Basic 8-bit quantization flow because some of the operations are + kept in the original precision. + +.. note:: + + Currently, 8-bit quantization with accuracy control in NNCF + is available only for models in OpenVINO representation. + +The steps for the quantization with accuracy control are described +below. + + + +.. _top: + +**Table of contents**: + +- `Imports <#imports>`__ +- `Prepare the Model <#prepare-the-model>`__ +- `Prepare LibriSpeech Dataset <#prepare-librispeech-dataset>`__ +- `Prepare calibration and validation datasets <#prepare-calibration-and-validation-datasets>`__ +- `Prepare validation function <#prepare-validation-function>`__ +- `Run quantization with accuracy control <#run-quantization-with-accuracy-control>`__ +- `Model Usage Example <#model-usage-example>`__ +- `Compare Accuracy of the Original and Quantized Models <#compare-accuracy-of-the-original-and-quantized-models>`__ + + +.. code:: ipython2 + + # !pip install -q "openvino-dev>=2023.1.0" "nncf>=2.6.0" + !pip install -q "openvino==2023.1.0.dev20230811" + !pip install git+https://github.com/openvinotoolkit/nncf.git@develop + !pip install -q soundfile librosa transformers torch datasets torchmetrics + +Imports `⇑ <#top>`__ +############################################################################################################################### + +.. code:: ipython2 + + import numpy as np + import torch + + from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor + +Prepare the Model `⇑ <#top>`__ +############################################################################################################################### + +For instantiating PyTorch model class, +we should use ``Wav2Vec2ForCTC.from_pretrained`` method with providing +model ID for downloading from HuggingFace hub. Model weights and +configuration files will be downloaded automatically in first time +usage. Keep in mind that downloading the files can take several minutes +and depends on your internet connection. + +Additionally, we can create processor class which is responsible for +model specific pre- and post-processing steps. + +.. code:: ipython2 + + BATCH_SIZE = 1 + MAX_SEQ_LENGTH = 30480 + + + torch_model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-base-960h", ctc_loss_reduction="mean") + processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-base-960h") + +Convert it to the OpenVINO Intermediate Representation (OpenVINO IR) + +.. code:: ipython2 + + import openvino + + + default_input = torch.zeros([1, MAX_SEQ_LENGTH], dtype=torch.float) + ov_model = openvino.convert_model(torch_model, example_input=default_input) + +Prepare LibriSpeech Dataset `⇑ <#top>`__ +############################################################################################################################### + +For demonstration purposes, we will use short dummy version of +LibriSpeech dataset - ``patrickvonplaten/librispeech_asr_dummy`` to +speed up model evaluation. Model accuracy can be different from reported +in the paper. For reproducing original accuracy, use ``librispeech_asr`` +dataset. + +.. code:: ipython2 + + from datasets import load_dataset + + + dataset = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation") + test_sample = dataset[0]["audio"] + + + # define preprocessing function for converting audio to input values for model + def map_to_input(batch): + preprocessed_signal = processor(batch["audio"]["array"], return_tensors="pt", padding="longest", sampling_rate=batch['audio']['sampling_rate']) + input_values = preprocessed_signal.input_values + batch['input_values'] = input_values + return batch + + + # apply preprocessing function to dataset and remove audio column, to save memory as we do not need it anymore + dataset = dataset.map(map_to_input, batched=False, remove_columns=["audio"]) + +Prepare calibration dataset `⇑ <#top>`__ +############################################################################################################################### + +.. code:: ipython2 + + import nncf + + + def transform_fn(data_item): + """ + Extract the model's input from the data item. + The data item here is the data item that is returned from the data source per iteration. + This function should be passed when the data item cannot be used as model's input. + """ + return np.array(data_item["input_values"]) + + + calibration_dataset = nncf.Dataset(dataset, transform_fn) + +Prepare validation function `⇑ <#top>`__ +############################################################################################################################### + +Define the validation function. + +.. code:: ipython2 + + from torchmetrics import WordErrorRate + from tqdm.notebook import tqdm + + + def validation_fn(model, dataset): + """ + Calculate and returns a metric for the model. + """ + wer = WordErrorRate() + for sample in tqdm(dataset): + # run infer function on sample + output = model.output(0) + logits = model(np.array(sample['input_values']))[output] + predicted_ids = np.argmax(logits, axis=-1) + transcription = processor.batch_decode(torch.from_numpy(predicted_ids)) + + # update metric on sample result + wer.update(transcription, [sample['text']]) + + result = wer.compute() + + return 1 - result + +Run quantization with accuracy control `⇑ <#top>`__ +############################################################################################################################### + +You should provide +the calibration dataset and the validation dataset. It can be the same +dataset. - parameter ``max_drop`` defines the accuracy drop threshold. +The quantization process stops when the degradation of accuracy metric +on the validation dataset is less than the ``max_drop``. The default +value is 0.01. NNCF will stop the quantization and report an error if +the ``max_drop`` value can’t be reached. - ``drop_type`` defines how the +accuracy drop will be calculated: ABSOLUTE (used by default) or +RELATIVE. - ``ranking_subset_size`` - size of a subset that is used to +rank layers by their contribution to the accuracy drop. Default value is +300, and the more samples it has the better ranking, potentially. Here +we use the value 25 to speed up the execution. + +.. note:: + + Execution can take tens of minutes and requires up to 10 GB + of free memory + + +.. code:: ipython2 + + from nncf.quantization.advanced_parameters import AdvancedAccuracyRestorerParameters + from nncf.parameters import ModelType + + quantized_model = nncf.quantize_with_accuracy_control( + ov_model, + calibration_dataset=calibration_dataset, + validation_dataset=calibration_dataset, + validation_fn=validation_fn, + max_drop=0.01, + drop_type=nncf.DropType.ABSOLUTE, + model_type=ModelType.TRANSFORMER, + advanced_accuracy_restorer_parameters=AdvancedAccuracyRestorerParameters( + ranking_subset_size=25 + ), + ) + +Model Usage Example `⇑ <#top>`__ +############################################################################################################################### + +.. code:: ipython2 + + import IPython.display as ipd + + + ipd.Audio(test_sample["array"], rate=16000) + +.. code:: ipython2 + + core = openvino.Core() + + compiled_quantized_model = core.compile_model(model=quantized_model, device_name='CPU') + + input_data = np.expand_dims(test_sample["array"], axis=0) + +Next, make a prediction. + +.. code:: ipython2 + + predictions = compiled_quantized_model([input_data])[0] + predicted_ids = np.argmax(predictions, axis=-1) + transcription = processor.batch_decode(torch.from_numpy(predicted_ids)) + transcription + +Compare Accuracy of the Original and Quantized Models `⇑ <#top>`__ +############################################################################################################################### + +- Define dataloader for test dataset. +- Define functions to get inference for PyTorch and OpenVINO models. +- Define functions to compute Word Error Rate. + +.. code:: ipython2 + + # inference function for pytorch + def torch_infer(model, sample): + logits = model(torch.Tensor(sample['input_values'])).logits + # take argmax and decode + predicted_ids = torch.argmax(logits, dim=-1) + transcription = processor.batch_decode(predicted_ids) + return transcription + + + # inference function for openvino + def ov_infer(model, sample): + output = model.output(0) + logits = model(np.array(sample['input_values']))[output] + predicted_ids = np.argmax(logits, axis=-1) + transcription = processor.batch_decode(torch.from_numpy(predicted_ids)) + return transcription + + + def compute_wer(dataset, model, infer_fn): + wer = WordErrorRate() + for sample in tqdm(dataset): + # run infer function on sample + transcription = infer_fn(model, sample) + # update metric on sample result + wer.update(transcription, [sample['text']]) + # finalize metric calculation + result = wer.compute() + return result + +Now, compute WER for the original PyTorch model and quantized model. + +.. code:: ipython2 + + pt_result = compute_wer(dataset, torch_model, torch_infer) + quantized_result = compute_wer(dataset, compiled_quantized_model, ov_infer) + + print(f'[PyTorch] Word Error Rate: {pt_result:.4f}') + print(f'[Quantized OpenVino] Word Error Rate: {quantized_result:.4f}') diff --git a/docs/notebooks/122-yolov8-quantization-with-accuracy-control-with-output.rst b/docs/notebooks/122-yolov8-quantization-with-accuracy-control-with-output.rst new file mode 100644 index 00000000000000..7bba4ef46f0c73 --- /dev/null +++ b/docs/notebooks/122-yolov8-quantization-with-accuracy-control-with-output.rst @@ -0,0 +1,306 @@ +Convert and Optimize YOLOv8 with OpenVINO™ +========================================== + + + +The YOLOv8 algorithm developed by Ultralytics is a cutting-edge, +state-of-the-art (SOTA) model that is designed to be fast, accurate, and +easy to use, making it an excellent choice for a wide range of object +detection, image segmentation, and image classification tasks. More +details about its realization can be found in the original model +`repository `__. + +This tutorial demonstrates step-by-step instructions on how to run apply +quantization with accuracy control to PyTorch YOLOv8. The advanced +quantization flow allows to apply 8-bit quantization to the model with +control of accuracy metric. This is achieved by keeping the most +impactful operations within the model in the original precision. The +flow is based on the `Basic 8-bit +quantization `__ +and has the following differences: + +- Besides the calibration dataset, a validation dataset is required to + compute the accuracy metric. Both datasets can refer to the same data + in the simplest case. +- Validation function, used to compute accuracy metric is required. It + can be a function that is already available in the source framework + or a custom function. +- Since accuracy validation is run several times during the + quantization process, quantization with accuracy control can take + more time than the Basic 8-bit quantization flow. +- The resulted model can provide smaller performance improvement than + the Basic 8-bit quantization flow because some of the operations are + kept in the original precision. + +.. note:: + + Currently, 8-bit quantization with accuracy control in NNCF + is available only for models in OpenVINO representation. + +The steps for the quantization with accuracy control are described +below. + +The tutorial consists of the following steps: + + + +- `Prerequisites <#prerequisites>`__ +- `Get Pytorch model and OpenVINO IR model <#get-pytorch-model-and-openvino-ir-model>`__ +- `Define validator and data loader <#define-validator-and-data-loader>`__ +- `Prepare calibration and validation datasets <#prepare-calibration-and-validation-datasets>`__ +- `Prepare validation function <#prepare-validation-function>`__ +- `Run quantization with accuracy control <#run-quantization-with-accuracy-control>`__ +- `Compare Accuracy and Performance of the Original and Quantized Models <#compare-accuracy-and-performance-of-the-original-and-quantized-models>`__ + +Prerequisites `⇑ <#top>`__ +############################################################################################################################### + + +Install necessary packages. + +.. code:: ipython2 + + !pip install -q "openvino==2023.1.0.dev20230811" + !pip install git+https://github.com/openvinotoolkit/nncf.git@develop + !pip install -q "ultralytics==8.0.43" + +Get Pytorch model and OpenVINO IR model `⇑ <#top>`__ +############################################################################################################################### + +Generally, PyTorch models represent an instance of the +`torch.nn.Module `__ +class, initialized by a state dictionary with model weights. We will use +the YOLOv8 nano model (also known as ``yolov8n``) pre-trained on a COCO +dataset, which is available in this +`repo `__. Similar steps are +also applicable to other YOLOv8 models. Typical steps to obtain a +pre-trained model: + +1. Create an instance of a model class. +2. Load a checkpoint state dict, which contains the pre-trained model + weights. + +In this case, the creators of the model provide an API that enables +converting the YOLOv8 model to ONNX and then to OpenVINO IR. Therefore, +we do not need to do these steps manually. + +.. code:: ipython2 + + import os + from pathlib import Path + + from ultralytics import YOLO + from ultralytics.yolo.cfg import get_cfg + from ultralytics.yolo.data.utils import check_det_dataset + from ultralytics.yolo.engine.validator import BaseValidator as Validator + from ultralytics.yolo.utils import DATASETS_DIR + from ultralytics.yolo.utils import DEFAULT_CFG + from ultralytics.yolo.utils import ops + from ultralytics.yolo.utils.metrics import ConfusionMatrix + + ROOT = os.path.abspath('') + + MODEL_NAME = "yolov8n-seg" + + model = YOLO(f"{ROOT}/{MODEL_NAME}.pt") + args = get_cfg(cfg=DEFAULT_CFG) + args.data = "coco128-seg.yaml" + +Load model. + +.. code:: ipython2 + + import openvino + + + model_path = Path(f"{ROOT}/{MODEL_NAME}_openvino_model/{MODEL_NAME}.xml") + if not model_path.exists(): + model.export(format="openvino", dynamic=True, half=False) + + ov_model = openvino.Core().read_model(model_path) + +Define validator and data loader `⇑ <#top>`__ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + +The original model +repository uses a ``Validator`` wrapper, which represents the accuracy +validation pipeline. It creates dataloader and evaluation metrics and +updates metrics on each data batch produced by the dataloader. Besides +that, it is responsible for data preprocessing and results +postprocessing. For class initialization, the configuration should be +provided. We will use the default setup, but it can be replaced with +some parameters overriding to test on custom data. The model has +connected the ``ValidatorClass`` method, which creates a validator class +instance. + +.. code:: ipython2 + + validator = model.ValidatorClass(args) + validator.data = check_det_dataset(args.data) + data_loader = validator.get_dataloader(f"{DATASETS_DIR}/coco128-seg", 1) + + validator.is_coco = True + validator.class_map = ops.coco80_to_coco91_class() + validator.names = model.model.names + validator.metrics.names = validator.names + validator.nc = model.model.model[-1].nc + validator.nm = 32 + validator.process = ops.process_mask + validator.plot_masks = [] + +Prepare calibration and validation datasets `⇑ <#top>`__ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + +We can use one dataset as calibration and validation datasets. Name it +``quantization_dataset``. + +.. code:: ipython2 + + from typing import Dict + + import nncf + + + def transform_fn(data_item: Dict): + input_tensor = validator.preprocess(data_item)["img"].numpy() + return input_tensor + + + quantization_dataset = nncf.Dataset(data_loader, transform_fn) + +Prepare validation function `⇑ <#top>`__ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + +.. code:: ipython2 + + from functools import partial + + import torch + from nncf.quantization.advanced_parameters import AdvancedAccuracyRestorerParameters + + + def validation_ac( + compiled_model: openvino.CompiledModel, + validation_loader: torch.utils.data.DataLoader, + validator: Validator, + num_samples: int = None, + ) -> float: + validator.seen = 0 + validator.jdict = [] + validator.stats = [] + validator.batch_i = 1 + validator.confusion_matrix = ConfusionMatrix(nc=validator.nc) + num_outputs = len(compiled_model.outputs) + + counter = 0 + for batch_i, batch in enumerate(validation_loader): + if num_samples is not None and batch_i == num_samples: + break + batch = validator.preprocess(batch) + results = compiled_model(batch["img"]) + if num_outputs == 1: + preds = torch.from_numpy(results[compiled_model.output(0)]) + else: + preds = [ + torch.from_numpy(results[compiled_model.output(0)]), + torch.from_numpy(results[compiled_model.output(1)]), + ] + preds = validator.postprocess(preds) + validator.update_metrics(preds, batch) + counter += 1 + stats = validator.get_stats() + if num_outputs == 1: + stats_metrics = stats["metrics/mAP50-95(B)"] + else: + stats_metrics = stats["metrics/mAP50-95(M)"] + print(f"Validate: dataset length = {counter}, metric value = {stats_metrics:.3f}") + + return stats_metrics + + + validation_fn = partial(validation_ac, validator=validator) + +Run quantization with accuracy control `⇑ <#top>`__ +############################################################################################################################### + +You should provide +the calibration dataset and the validation dataset. It can be the same +dataset. - parameter ``max_drop`` defines the accuracy drop threshold. +The quantization process stops when the degradation of accuracy metric +on the validation dataset is less than the ``max_drop``. The default +value is 0.01. NNCF will stop the quantization and report an error if +the ``max_drop`` value can’t be reached. - ``drop_type`` defines how the +accuracy drop will be calculated: ABSOLUTE (used by default) or +RELATIVE. - ``ranking_subset_size`` - size of a subset that is used to +rank layers by their contribution to the accuracy drop. Default value is +300, and the more samples it has the better ranking, potentially. Here +we use the value 25 to speed up the execution. + +.. note:: + + Execution can take tens of minutes and requires up to 15 GB + of free memory + +.. code:: ipython2 + + quantized_model = nncf.quantize_with_accuracy_control( + ov_model, + quantization_dataset, + quantization_dataset, + validation_fn=validation_fn, + max_drop=0.01, + preset=nncf.QuantizationPreset.MIXED, + advanced_accuracy_restorer_parameters=AdvancedAccuracyRestorerParameters( + ranking_subset_size=25, + num_ranking_processes=1 + ), + ) + +Compare Accuracy and Performance of the Original and Quantized Models `⇑ <#top>`__ +############################################################################################################################### + + +Now we can compare metrics of the Original non-quantized +OpenVINO IR model and Quantized OpenVINO IR model to make sure that the +``max_drop`` is not exceeded. + +.. code:: ipython2 + + import openvino + + core = openvino.Core() + quantized_compiled_model = core.compile_model(model=quantized_model, device_name='CPU') + compiled_ov_model = core.compile_model(model=ov_model, device_name='CPU') + + pt_result = validation_ac(compiled_ov_model, data_loader, validator) + quantized_result = validation_ac(quantized_compiled_model, data_loader, validator) + + + print(f'[Original OpenVino]: {pt_result:.4f}') + print(f'[Quantized OpenVino]: {quantized_result:.4f}') + +And compare performance. + +.. code:: ipython2 + + from pathlib import Path + # Set model directory + MODEL_DIR = Path("model") + MODEL_DIR.mkdir(exist_ok=True) + + ir_model_path = MODEL_DIR / 'ir_model.xml' + quantized_model_path = MODEL_DIR / 'quantized_model.xml' + + # Save models to use them in the commandline banchmark app + openvino.save_model(ov_model, ir_model_path, compress_to_fp16=False) + openvino.save_model(quantized_model, quantized_model_path, compress_to_fp16=False) + +.. code:: ipython2 + + # Inference Original model (OpenVINO IR) + ! benchmark_app -m $ir_model_path -shape "[1,3,640,640]" -d CPU -api async + +.. code:: ipython2 + + # Inference Quantized model (OpenVINO IR) + ! benchmark_app -m $quantized_model_path -shape "[1,3,640,640]" -d CPU -api async diff --git a/docs/notebooks/201-vision-monodepth-with-output.rst b/docs/notebooks/201-vision-monodepth-with-output.rst index 165a2b6f49cfee..b26e82ff4e7ff0 100644 --- a/docs/notebooks/201-vision-monodepth-with-output.rst +++ b/docs/notebooks/201-vision-monodepth-with-output.rst @@ -1,7 +1,7 @@ Monodepth Estimation with OpenVINO ================================== -.. _top: + This tutorial demonstrates Monocular Depth Estimation with MidasNet in OpenVINO. Model information can be found @@ -30,6 +30,8 @@ Transfer,” `__ in IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: ``10.1109/TPAMI.2020.3019967``. +.. _top: + **Table of contents**: - `Preparation <#preparation>`__ diff --git a/docs/notebooks/202-vision-superresolution-image-with-output.rst b/docs/notebooks/202-vision-superresolution-image-with-output.rst index 49061b1ab2cee7..2cdac3c01bf069 100644 --- a/docs/notebooks/202-vision-superresolution-image-with-output.rst +++ b/docs/notebooks/202-vision-superresolution-image-with-output.rst @@ -1,7 +1,7 @@ Single Image Super Resolution with OpenVINO™ ============================================ -.. _top: + Super Resolution is the process of enhancing the quality of an image by increasing the pixel count using deep learning. This notebook shows the @@ -16,6 +16,8 @@ Resolution,” `__ 2018 24th International Conference on Pattern Recognition (ICPR), 2018, pp. 2777-2784, doi: 10.1109/ICPR.2018.8545760. +.. _top: + **Table of contents**: - `Preparation <#preparation>`__ diff --git a/docs/notebooks/202-vision-superresolution-video-with-output.rst b/docs/notebooks/202-vision-superresolution-video-with-output.rst index 1a5da455211120..c2534fe7a3cc38 100644 --- a/docs/notebooks/202-vision-superresolution-video-with-output.rst +++ b/docs/notebooks/202-vision-superresolution-video-with-output.rst @@ -1,7 +1,7 @@ Video Super Resolution with OpenVINO™ ===================================== -.. _top: + Super Resolution is the process of enhancing the quality of an image by increasing the pixel count using deep learning. This notebook applies @@ -23,6 +23,8 @@ pp. 2777-2784, doi: 10.1109/ICPR.2018.8545760. video. +.. _top: + **Table of contents**: - `Preparation <#preparation>`__ diff --git a/docs/notebooks/203-meter-reader-with-output.rst b/docs/notebooks/203-meter-reader-with-output.rst index b94c3f996d322f..0efdff356edd3e 100644 --- a/docs/notebooks/203-meter-reader-with-output.rst +++ b/docs/notebooks/203-meter-reader-with-output.rst @@ -1,7 +1,7 @@ Industrial Meter Reader ======================= -.. _top: + This notebook shows how to create a industrial meter reader with OpenVINO Runtime. We use the pre-trained @@ -21,6 +21,8 @@ to build up a multiple inference task pipeline: workflow +.. _top: + **Table of contents**: - `Import <#import>`__ diff --git a/docs/notebooks/204-segmenter-semantic-segmentation-with-output.rst b/docs/notebooks/204-segmenter-semantic-segmentation-with-output.rst index 58cbada5be05b2..750508af69864b 100644 --- a/docs/notebooks/204-segmenter-semantic-segmentation-with-output.rst +++ b/docs/notebooks/204-segmenter-semantic-segmentation-with-output.rst @@ -1,7 +1,7 @@ Semantic Segmentation with OpenVINO™ using Segmenter ==================================================== -.. _top: + Semantic segmentation is a difficult computer vision problem with many applications such as autonomous driving, robotics, augmented reality, @@ -28,6 +28,8 @@ paper: `Segmenter: Transformer for Semantic Segmentation `__ or in the `repository `__. +.. _top: + **Table of contents**: - `Get and prepare PyTorch model <#get-and-prepare-pytorch-model>`__ diff --git a/docs/notebooks/205-vision-background-removal-with-output.rst b/docs/notebooks/205-vision-background-removal-with-output.rst index b22856be7c5e4a..976361459d5696 100644 --- a/docs/notebooks/205-vision-background-removal-with-output.rst +++ b/docs/notebooks/205-vision-background-removal-with-output.rst @@ -1,7 +1,7 @@ Image Background Removal with U^2-Net and OpenVINO™ =================================================== -.. _top: + This notebook demonstrates background removal in images using U\ :math:`^2`-Net and OpenVINO. @@ -17,6 +17,8 @@ The model source is available `here `__. +.. _top: + **Table of contents**: - `Preparation <#preparation>`__ diff --git a/docs/notebooks/206-vision-paddlegan-anime-with-output.rst b/docs/notebooks/206-vision-paddlegan-anime-with-output.rst index db75f33877699d..0dc6a88ad4c929 100644 --- a/docs/notebooks/206-vision-paddlegan-anime-with-output.rst +++ b/docs/notebooks/206-vision-paddlegan-anime-with-output.rst @@ -1,7 +1,7 @@ Photos to Anime with PaddleGAN and OpenVINO =========================================== -.. _top: + This tutorial demonstrates converting a `PaddlePaddle/PaddleGAN `__ @@ -16,6 +16,8 @@ documentation `__ diff --git a/docs/notebooks/207-vision-paddlegan-superresolution-with-output.rst b/docs/notebooks/207-vision-paddlegan-superresolution-with-output.rst index 5967a0bf7b199c..b19bfc982c628f 100644 --- a/docs/notebooks/207-vision-paddlegan-superresolution-with-output.rst +++ b/docs/notebooks/207-vision-paddlegan-superresolution-with-output.rst @@ -1,7 +1,7 @@ Super Resolution with PaddleGAN and OpenVINO™ ============================================= -.. _top: + This notebook demonstrates converting the RealSR (real-world super-resolution) model from @@ -18,6 +18,8 @@ from CVPR 2020. This notebook works best with small images (up to 800x600 resolution). +.. _top: + **Table of contents**: - `Imports <#imports>`__ diff --git a/docs/notebooks/208-optical-character-recognition-with-output.rst b/docs/notebooks/208-optical-character-recognition-with-output.rst index 5f15a81edf3aa5..30524055a60e84 100644 --- a/docs/notebooks/208-optical-character-recognition-with-output.rst +++ b/docs/notebooks/208-optical-character-recognition-with-output.rst @@ -1,7 +1,7 @@ Optical Character Recognition (OCR) with OpenVINO™ ================================================== -.. _top: + This tutorial demonstrates how to perform optical character recognition (OCR) with OpenVINO models. It is a continuation of the @@ -21,6 +21,8 @@ Zoo `__. For more information, refer to the `104-model-tools <104-model-tools-with-output.html>`__ tutorial. +.. _top: + **Table of contents**: - `Imports <#imports>`__ diff --git a/docs/notebooks/209-handwritten-ocr-with-output.rst b/docs/notebooks/209-handwritten-ocr-with-output.rst index 10dba766b1da8e..454802bf631b0e 100644 --- a/docs/notebooks/209-handwritten-ocr-with-output.rst +++ b/docs/notebooks/209-handwritten-ocr-with-output.rst @@ -1,7 +1,7 @@ Handwritten Chinese and Japanese OCR with OpenVINO™ =================================================== -.. _top: + In this tutorial, we perform optical character recognition (OCR) for handwritten Chinese (simplified) and Japanese. An OCR tutorial using the @@ -19,6 +19,8 @@ and `scut_ept `__ charlists are used. Both models are available on `Open Model Zoo `__. +.. _top: + **Table of contents**: - `Imports <#imports>`__ diff --git a/docs/notebooks/210-slowfast-video-recognition-with-output.rst b/docs/notebooks/210-slowfast-video-recognition-with-output.rst index e795d99a6ef5ee..c2bcfa25c5d064 100644 --- a/docs/notebooks/210-slowfast-video-recognition-with-output.rst +++ b/docs/notebooks/210-slowfast-video-recognition-with-output.rst @@ -1,7 +1,7 @@ Video Recognition using SlowFast and OpenVINO™ ============================================== -.. _top: + Teaching machines to detect, understand and analyze the contents of images has been one of the more well-known and well-studied problems in @@ -40,6 +40,8 @@ This tutorial consists of the following steps .. |image0| image:: https://user-images.githubusercontent.com/34324155/143044111-94676f64-7ba8-4081-9011-f8054bed7030.png +.. _top: + **Table of contents**: - `Prepare PyTorch Model <#prepare-pytorch-model>`__ diff --git a/docs/notebooks/211-speech-to-text-with-output.rst b/docs/notebooks/211-speech-to-text-with-output.rst index 080d8b092c9a09..95d919eb6d637f 100644 --- a/docs/notebooks/211-speech-to-text-with-output.rst +++ b/docs/notebooks/211-speech-to-text-with-output.rst @@ -1,7 +1,7 @@ Speech to Text with OpenVINO™ ============================= -.. _top: + This tutorial demonstrates speech-to-text recognition with OpenVINO. @@ -13,6 +13,8 @@ with Connectionist Temporal Classification (CTC) loss. The model is available from `Open Model Zoo `__. +.. _top: + **Table of contents**: - `Imports <#imports>`__ diff --git a/docs/notebooks/212-pyannote-speaker-diarization-with-output.rst b/docs/notebooks/212-pyannote-speaker-diarization-with-output.rst index 8fabfbf8b90a1e..2e8af021276050 100644 --- a/docs/notebooks/212-pyannote-speaker-diarization-with-output.rst +++ b/docs/notebooks/212-pyannote-speaker-diarization-with-output.rst @@ -1,7 +1,7 @@ Speaker diarization =================== -.. _top: + Speaker diarization is the process of partitioning an audio stream containing human speech into homogeneous segments according to the @@ -39,6 +39,8 @@ card `__, `repo `__ and `paper `__. +.. _top: + **Table of contents**: - `Prerequisites <#prerequisites>`__ diff --git a/docs/notebooks/213-question-answering-with-output.rst b/docs/notebooks/213-question-answering-with-output.rst index e3fc0ee6c8d144..9b1be824b7a9a9 100644 --- a/docs/notebooks/213-question-answering-with-output.rst +++ b/docs/notebooks/213-question-answering-with-output.rst @@ -1,7 +1,7 @@ Interactive question answering with OpenVINO™ ============================================= -.. _top: + This demo shows interactive question answering with OpenVINO, using `small BERT-large-like @@ -11,6 +11,8 @@ larger BERT-large model. The model comes from `Open Model Zoo `__. Final part of this notebook provides live inference results from your inputs. +.. _top: + **Table of contents**: - `Imports <#imports>`__ diff --git a/docs/notebooks/214-grammar-correction-with-output.rst b/docs/notebooks/214-grammar-correction-with-output.rst index eaff3b6e620411..434aabbacd3490 100644 --- a/docs/notebooks/214-grammar-correction-with-output.rst +++ b/docs/notebooks/214-grammar-correction-with-output.rst @@ -1,7 +1,7 @@ Grammatical Error Correction with OpenVINO ========================================== -.. _top: + AI-based auto-correction products are becoming increasingly popular due to their ease of use, editing speed, and affordability. These products @@ -43,6 +43,8 @@ It consists of the following steps: Optimum `__. - Create an inference pipeline for grammatical error checking +.. _top: + **Table of contents**: - `How does it work? <#how-does-it-work>`__ diff --git a/docs/notebooks/215-image-inpainting-with-output.rst b/docs/notebooks/215-image-inpainting-with-output.rst index d19f2424c062de..c431ee55da9359 100644 --- a/docs/notebooks/215-image-inpainting-with-output.rst +++ b/docs/notebooks/215-image-inpainting-with-output.rst @@ -1,7 +1,7 @@ Image In-painting with OpenVINO™ -------------------------------- -.. _top: + This notebook demonstrates how to use an image in-painting model with OpenVINO, using `GMCNN @@ -11,6 +11,8 @@ given a tampered image, is able to create something very similar to the original image. The Following pipeline will be used in this notebook. |pipeline| +.. _top: + **Table of contents**: - `Download the Model <#download-the-model>`__ diff --git a/docs/notebooks/216-attention-center-with-output.rst b/docs/notebooks/216-attention-center-with-output.rst index 2f89f1eb2366e2..0e50d17ec85e40 100644 --- a/docs/notebooks/216-attention-center-with-output.rst +++ b/docs/notebooks/216-attention-center-with-output.rst @@ -1,7 +1,7 @@ The attention center model with OpenVINO™ ========================================= -.. _top: + This notebook demonstrates how to use the `attention center model `__ with @@ -51,6 +51,8 @@ The attention center model has been trained with images from the `COCO dataset `__ annotated with saliency from the `SALICON dataset `__. +.. _top: + **Table of contents**: - `Imports <#imports>`__ diff --git a/docs/notebooks/217-vision-deblur-with-output.rst b/docs/notebooks/217-vision-deblur-with-output.rst index 57d9f2c82b7ea8..1241fab1900fa3 100644 --- a/docs/notebooks/217-vision-deblur-with-output.rst +++ b/docs/notebooks/217-vision-deblur-with-output.rst @@ -1,6 +1,8 @@ Deblur Photos with DeblurGAN-v2 and OpenVINO™ ============================================= + + .. _top: **Table of contents**: diff --git a/docs/notebooks/218-vehicle-detection-and-recognition-with-output.rst b/docs/notebooks/218-vehicle-detection-and-recognition-with-output.rst index c5237117f8a960..2bc8a6cd2e9d94 100644 --- a/docs/notebooks/218-vehicle-detection-and-recognition-with-output.rst +++ b/docs/notebooks/218-vehicle-detection-and-recognition-with-output.rst @@ -1,7 +1,7 @@ Vehicle Detection And Recognition with OpenVINO™ ================================================ -.. _top: + This tutorial demonstrates how to use two pre-trained models from `Open Model Zoo `__: @@ -19,6 +19,8 @@ As a result, you can get: result +.. _top: + **Table of contents**: - `Imports <#imports>`__ diff --git a/docs/notebooks/219-knowledge-graphs-conve-with-output.rst b/docs/notebooks/219-knowledge-graphs-conve-with-output.rst index c0fc8e07ebcf6e..67c8776cff2d4a 100644 --- a/docs/notebooks/219-knowledge-graphs-conve-with-output.rst +++ b/docs/notebooks/219-knowledge-graphs-conve-with-output.rst @@ -1,7 +1,7 @@ OpenVINO optimizations for Knowledge graphs =========================================== -.. _top: + The goal of this notebook is to showcase performance optimizations for the ConvE knowledge graph embeddings model using the Intel® Distribution @@ -18,6 +18,8 @@ The ConvE model is an implementation of the paper - sample dataset can be downloaded from: https://github.com/TimDettmers/ConvE/tree/master/countries/countries_S1 +.. _top: + **Table of contents**: - `Windows specific settings <#windows-specific-settings>`__ diff --git a/docs/notebooks/220-cross-lingual-books-alignment-with-output.rst b/docs/notebooks/220-cross-lingual-books-alignment-with-output.rst index b38031cada0bef..fd4179634a992a 100644 --- a/docs/notebooks/220-cross-lingual-books-alignment-with-output.rst +++ b/docs/notebooks/220-cross-lingual-books-alignment-with-output.rst @@ -1,7 +1,7 @@ Cross-lingual Books Alignment with Transformers and OpenVINO™ ============================================================= -.. _top: + Cross-lingual text alignment is the task of matching sentences in a pair of texts that are translations of each other. In this notebook, you’ll @@ -39,6 +39,8 @@ Prerequisites - ``seaborn`` - for alignment matrix visualization - ``ipywidgets`` - for displaying HTML and JS output in the notebook +.. _top: + **Table of contents**: - `Get Books <#get-books>`__ diff --git a/docs/notebooks/221-machine-translation-with-output.rst b/docs/notebooks/221-machine-translation-with-output.rst index f8c36d8b482fb4..b4103a43f252bd 100644 --- a/docs/notebooks/221-machine-translation-with-output.rst +++ b/docs/notebooks/221-machine-translation-with-output.rst @@ -1,7 +1,7 @@ Machine translation demo ======================== -.. _top: + This demo utilizes Intel’s pre-trained model that translates from English to German. More information about the model can be found @@ -18,6 +18,8 @@ following structure: ```` + *tokenized sentence* + ```` + **Output** After the inference, we have a sequence of up to 200 tokens. The structure is the same as the one for the input. +.. _top: + **Table of contents**: - `Downloading model <#downloading-model>`__ diff --git a/docs/notebooks/222-vision-image-colorization-with-output.rst b/docs/notebooks/222-vision-image-colorization-with-output.rst index 6ae30f4ecf0d89..8d11c9030fc30e 100644 --- a/docs/notebooks/222-vision-image-colorization-with-output.rst +++ b/docs/notebooks/222-vision-image-colorization-with-output.rst @@ -1,7 +1,7 @@ Image Colorization with OpenVINO ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -.. _top: + This notebook demonstrates how to colorize images with OpenVINO using the Colorization model @@ -44,6 +44,8 @@ About Colorization-siggraph See the `colorization `__ repository for more details. +.. _top: + **Table of contents**: - `Imports <#imports>`__ diff --git a/docs/notebooks/223-text-prediction-with-output.rst b/docs/notebooks/223-text-prediction-with-output.rst index 15110bc0bc3939..97a7a1d8d8542e 100644 --- a/docs/notebooks/223-text-prediction-with-output.rst +++ b/docs/notebooks/223-text-prediction-with-output.rst @@ -1,7 +1,7 @@ Text Prediction with OpenVINO™ ============================== -.. _top: + This notebook shows text prediction with OpenVINO. This notebook can work in two different modes, Text Generation and Conversation, which the @@ -73,6 +73,8 @@ above. The Generated response is added to the history with the and the sequence is passed back into the model. +.. _top: + **Table of contents**: - `Model Selection <#model-selection>`__ diff --git a/docs/notebooks/224-3D-segmentation-point-clouds-with-output.rst b/docs/notebooks/224-3D-segmentation-point-clouds-with-output.rst index fde076943bb6f7..8dcb6fa3b7f11d 100644 --- a/docs/notebooks/224-3D-segmentation-point-clouds-with-output.rst +++ b/docs/notebooks/224-3D-segmentation-point-clouds-with-output.rst @@ -1,7 +1,7 @@ Part Segmentation of 3D Point Clouds with OpenVINO™ =================================================== -.. _top: + This notebook demonstrates how to process `point cloud `__ data and run 3D @@ -24,6 +24,8 @@ segmentation, to scene semantic parsing. It is highly efficient and effective, showing strong performance on par or even better than state of the art. +.. _top: + **Table of contents**: - `Imports <#imports>`__ diff --git a/docs/notebooks/225-stable-diffusion-text-to-image-with-output.rst b/docs/notebooks/225-stable-diffusion-text-to-image-with-output.rst index 90f6243f6c3dda..255e3b6b2a51cc 100644 --- a/docs/notebooks/225-stable-diffusion-text-to-image-with-output.rst +++ b/docs/notebooks/225-stable-diffusion-text-to-image-with-output.rst @@ -1,7 +1,7 @@ Text-to-Image Generation with Stable Diffusion and OpenVINO™ ============================================================ -.. _top: + Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from @@ -41,6 +41,8 @@ Notebook contains the following steps: API. 3. Run Stable Diffusion pipeline with OpenVINO. +.. _top: + **Table of contents**: - `Prerequisites <#prerequisites>`__ diff --git a/docs/notebooks/226-yolov7-optimization-with-output.rst b/docs/notebooks/226-yolov7-optimization-with-output.rst index 34b7c05d209af2..e87f4de95642ce 100644 --- a/docs/notebooks/226-yolov7-optimization-with-output.rst +++ b/docs/notebooks/226-yolov7-optimization-with-output.rst @@ -1,7 +1,7 @@ Convert and Optimize YOLOv7 with OpenVINO™ ========================================== -.. _top: + The YOLOv7 algorithm is making big waves in the computer vision and machine learning communities. It is a real-time object detection @@ -40,6 +40,8 @@ The tutorial consists of the following steps: - Compare accuracy of the FP32 and quantized models. - Compare performance of the FP32 and quantized models. +.. _top: + **Table of contents**: - `Get Pytorch model <#get-pytorch-model>`__ diff --git a/docs/notebooks/227-whisper-subtitles-generation-with-output.rst b/docs/notebooks/227-whisper-subtitles-generation-with-output.rst index 05b04c2fec8dd3..39d210defad500 100644 --- a/docs/notebooks/227-whisper-subtitles-generation-with-output.rst +++ b/docs/notebooks/227-whisper-subtitles-generation-with-output.rst @@ -1,7 +1,7 @@ Video Subtitle Generation using Whisper and OpenVINO™ ===================================================== -.. _top: + `Whisper `__ is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and @@ -26,6 +26,8 @@ Download the model. 2. Instantiate the PyTorch model pipeline. 3. Export the ONNX model and convert it to OpenVINO IR, using model conversion API. 4. Run the Whisper pipeline with OpenVINO models. +.. _top: + **Table of contents**: - `Prerequisites <#prerequisites>`__ diff --git a/docs/notebooks/228-clip-zero-shot-convert-with-output.rst b/docs/notebooks/228-clip-zero-shot-convert-with-output.rst index 913817a8a4e34a..63f70768c20f1a 100644 --- a/docs/notebooks/228-clip-zero-shot-convert-with-output.rst +++ b/docs/notebooks/228-clip-zero-shot-convert-with-output.rst @@ -1,7 +1,7 @@ Zero-shot Image Classification with OpenAI CLIP and OpenVINO™ ============================================================= -.. _top: + Zero-shot image classification is a computer vision task to classify images into one of several classes without any prior training or @@ -30,6 +30,8 @@ image classification. The notebook contains the following steps: conversion API. 4. Run CLIP with OpenVINO. +.. _top: + **Table of contents**: - `Instantiate model <#instantiate-model>`__ diff --git a/docs/notebooks/228-clip-zero-shot-quantize-with-output.rst b/docs/notebooks/228-clip-zero-shot-quantize-with-output.rst index f6c2d4fb2f0bdf..1e335a73b2fd53 100644 --- a/docs/notebooks/228-clip-zero-shot-quantize-with-output.rst +++ b/docs/notebooks/228-clip-zero-shot-quantize-with-output.rst @@ -1,7 +1,7 @@ Post-Training Quantization of OpenAI CLIP model with NNCF ========================================================= -.. _top: + The goal of this tutorial is to demonstrate how to speed up the model by applying 8-bit post-training quantization from @@ -23,6 +23,8 @@ The optimization process contains the following steps: notebook first to generate OpenVINO IR model that is used for quantization. +.. _top: + **Table of contents**: - `Prerequisites <#prerequisites>`__ diff --git a/docs/notebooks/229-distilbert-sequence-classification-with-output.rst b/docs/notebooks/229-distilbert-sequence-classification-with-output.rst index 23f8369e3dcc34..4095cec55bf40f 100644 --- a/docs/notebooks/229-distilbert-sequence-classification-with-output.rst +++ b/docs/notebooks/229-distilbert-sequence-classification-with-output.rst @@ -1,7 +1,7 @@ Sentiment Analysis with OpenVINO™ ================================= -.. _top: + **Sentiment analysis** is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically @@ -9,6 +9,8 @@ identify, extract, quantify, and study affective states and subjective information. This notebook demonstrates how to convert and run a sequence classification model using OpenVINO. +.. _top: + **Table of contents**: - `Imports <#imports>`__ diff --git a/docs/notebooks/230-yolov8-optimization-with-output.rst b/docs/notebooks/230-yolov8-optimization-with-output.rst index 5d6270f86413a8..e8a9f31bcd6d0b 100644 --- a/docs/notebooks/230-yolov8-optimization-with-output.rst +++ b/docs/notebooks/230-yolov8-optimization-with-output.rst @@ -1,7 +1,7 @@ Convert and Optimize YOLOv8 with OpenVINO™ ========================================== -.. _top: + The YOLOv8 algorithm developed by Ultralytics is a cutting-edge, state-of-the-art (SOTA) model that is designed to be fast, accurate, and @@ -39,6 +39,8 @@ The tutorial consists of the following steps: - Compare performance of the FP32 and quantized models. - Compare accuracy of the FP32 and quantized models. +.. _top: + **Table of contents**: - `Get Pytorch model <#get-pytorch-model>`__ diff --git a/docs/notebooks/231-instruct-pix2pix-image-editing-with-output.rst b/docs/notebooks/231-instruct-pix2pix-image-editing-with-output.rst index bf63a422e49bcf..308a358d1c51fc 100644 --- a/docs/notebooks/231-instruct-pix2pix-image-editing-with-output.rst +++ b/docs/notebooks/231-instruct-pix2pix-image-editing-with-output.rst @@ -1,7 +1,7 @@ Image Editing with InstructPix2Pix and OpenVINO =============================================== -.. _top: + The InstructPix2Pix is a conditional diffusion model that edits images based on written instructions provided by the user. Generative image @@ -31,6 +31,8 @@ Notebook contains the following steps: 3. Run InstructPix2Pix pipeline with OpenVINO. +.. _top: + **Table of contents**: - `Prerequisites <#prerequisites>`__ diff --git a/docs/notebooks/233-blip-visual-language-processing-with-output.rst b/docs/notebooks/233-blip-visual-language-processing-with-output.rst index 2637f314bf1d32..8468422b451f40 100644 --- a/docs/notebooks/233-blip-visual-language-processing-with-output.rst +++ b/docs/notebooks/233-blip-visual-language-processing-with-output.rst @@ -1,7 +1,7 @@ Visual Question Answering and Image Captioning using BLIP and OpenVINO ====================================================================== -.. _top: + Humans perceive the world through vision and language. A longtime goal of AI is to build intelligent agents that can understand the world @@ -24,6 +24,8 @@ The tutorial consists of the following parts: 2. Convert the BLIP model to OpenVINO IR. 3. Run visual question answering and image captioning with OpenVINO. +.. _top: + **Table of contents**: - `Background <#background>`__ diff --git a/docs/notebooks/234-encodec-audio-compression-with-output.rst b/docs/notebooks/234-encodec-audio-compression-with-output.rst index 309214879cdbde..7e98b009f940ba 100644 --- a/docs/notebooks/234-encodec-audio-compression-with-output.rst +++ b/docs/notebooks/234-encodec-audio-compression-with-output.rst @@ -1,7 +1,7 @@ Audio compression with EnCodec and OpenVINO =========================================== -.. _top: + Compression is an important part of the Internet today because it enables people to easily share high-quality photos, listen to audio @@ -28,6 +28,8 @@ and original `repo `__. image.png +.. _top: + **Table of contents**: - `Prerequisites <#prerequisites>`__ diff --git a/docs/notebooks/235-controlnet-stable-diffusion-with-output.rst b/docs/notebooks/235-controlnet-stable-diffusion-with-output.rst index 39afa1c1734440..471e72ca3d0aea 100644 --- a/docs/notebooks/235-controlnet-stable-diffusion-with-output.rst +++ b/docs/notebooks/235-controlnet-stable-diffusion-with-output.rst @@ -1,7 +1,7 @@ Text-to-Image Generation with ControlNet Conditioning ===================================================== -.. _top: + Diffusion models make a revolution in AI-generated art. This technology enables creation of high-quality images simply by writing a text prompt. @@ -141,6 +141,8 @@ of the target in the image: This tutorial focuses mainly on conditioning by pose. However, the discussed steps are also applicable to other annotation modes. +.. _top: + **Table of contents**: - `Prerequisites <#prerequisites>`__ diff --git a/docs/notebooks/236-stable-diffusion-v2-infinite-zoom-with-output.rst b/docs/notebooks/236-stable-diffusion-v2-infinite-zoom-with-output.rst index 7e1ab803c07af3..75656ed47aa094 100644 --- a/docs/notebooks/236-stable-diffusion-v2-infinite-zoom-with-output.rst +++ b/docs/notebooks/236-stable-diffusion-v2-infinite-zoom-with-output.rst @@ -1,7 +1,7 @@ Infinite Zoom Stable Diffusion v2 and OpenVINO™ =============================================== -.. _top: + Stable Diffusion v2 is the next generation of Stable Diffusion model a Text-to-Image latent diffusion model created by the researchers and @@ -74,6 +74,8 @@ Notebook contains the following steps: 3. Run Stable Diffusion v2 inpainting pipeline for generation infinity zoom video +.. _top: + **Table of contents**: - `Stable Diffusion v2 Infinite Zoom Showcase <#stable-diffusion-v2-infinite-zoom-showcase>`__ diff --git a/docs/notebooks/236-stable-diffusion-v2-optimum-demo-comparison-with-output.rst b/docs/notebooks/236-stable-diffusion-v2-optimum-demo-comparison-with-output.rst index 59df2505a79d6f..ff8f9a9350f7ad 100644 --- a/docs/notebooks/236-stable-diffusion-v2-optimum-demo-comparison-with-output.rst +++ b/docs/notebooks/236-stable-diffusion-v2-optimum-demo-comparison-with-output.rst @@ -1,10 +1,12 @@ Stable Diffusion v2.1 using Optimum-Intel OpenVINO and multiple Intel Hardware ============================================================================== -.. _top: + |image0| +.. _top: + **Table of contents**: - `Showing Info Available Devices <#showing-info-available-devices>`__ diff --git a/docs/notebooks/236-stable-diffusion-v2-optimum-demo-with-output.rst b/docs/notebooks/236-stable-diffusion-v2-optimum-demo-with-output.rst index 5a053a454ae0f1..bfa6ef6dce9ef0 100644 --- a/docs/notebooks/236-stable-diffusion-v2-optimum-demo-with-output.rst +++ b/docs/notebooks/236-stable-diffusion-v2-optimum-demo-with-output.rst @@ -1,10 +1,12 @@ Stable Diffusion v2.1 using Optimum-Intel OpenVINO ================================================== -.. _top: + |image0| +.. _top: + **Table of contents**: - `Showing Info Available Devices <#showing-info-available-devices>`__ diff --git a/docs/notebooks/236-stable-diffusion-v2-text-to-image-demo-with-output.rst b/docs/notebooks/236-stable-diffusion-v2-text-to-image-demo-with-output.rst index fc0468612224fe..7cd65143c0b083 100644 --- a/docs/notebooks/236-stable-diffusion-v2-text-to-image-demo-with-output.rst +++ b/docs/notebooks/236-stable-diffusion-v2-text-to-image-demo-with-output.rst @@ -1,7 +1,7 @@ Stable Diffusion Text-to-Image Demo =================================== -.. _top: + Stable Diffusion is an innovative generative AI technique that allows us to generate and manipulate images in interesting ways, including @@ -26,6 +26,8 @@ promising results for selecting a wide range of input text prompts! `236-stable-diffusion-v2-text-to-image `__. +.. _top: + **Table of contents**: - `Step 0: Install and import prerequisites <#step-0-install-and-import-prerequisites>`__ diff --git a/docs/notebooks/236-stable-diffusion-v2-text-to-image-with-output.rst b/docs/notebooks/236-stable-diffusion-v2-text-to-image-with-output.rst index 62bf25eb5d3753..33a4df82bfdbfc 100644 --- a/docs/notebooks/236-stable-diffusion-v2-text-to-image-with-output.rst +++ b/docs/notebooks/236-stable-diffusion-v2-text-to-image-with-output.rst @@ -1,7 +1,7 @@ Text-to-Image Generation with Stable Diffusion v2 and OpenVINO™ =============================================================== -.. _top: + Stable Diffusion v2 is the next generation of Stable Diffusion model a Text-to-Image latent diffusion model created by the researchers and @@ -81,6 +81,8 @@ Notebook contains the following steps: notebook `__. +.. _top: + **Table of contents**: - `Prerequisites <#prerequisites>`__ diff --git a/docs/notebooks/237-segment-anything-with-output.rst b/docs/notebooks/237-segment-anything-with-output.rst index 121fc6dca6c6fe..2db34401ec919d 100644 --- a/docs/notebooks/237-segment-anything-with-output.rst +++ b/docs/notebooks/237-segment-anything-with-output.rst @@ -1,6 +1,8 @@ Object masks from prompts with SAM and OpenVINO =============================================== + + .. _top: **Table of contents**: diff --git a/docs/notebooks/238-deep-floyd-if-with-output.rst b/docs/notebooks/238-deep-floyd-if-with-output.rst index 9e438e5bc06bd5..5dd2e369c1b2b8 100644 --- a/docs/notebooks/238-deep-floyd-if-with-output.rst +++ b/docs/notebooks/238-deep-floyd-if-with-output.rst @@ -1,8 +1,6 @@ Image generation with DeepFloyd IF and OpenVINO™ ================================================ -.. _top: - DeepFloyd IF is an advanced open-source text-to-image model that delivers remarkable photorealism and language comprehension. DeepFloyd IF consists of a frozen text encoder and three cascaded pixel diffusion @@ -78,6 +76,10 @@ vector in embedded space. conventional Super Resolution network to get hi-res results. + + +.. _top: + **Table of contents**: - `Prerequisites <#prerequisites>`__ diff --git a/docs/notebooks/239-image-bind-convert-with-output.rst b/docs/notebooks/239-image-bind-convert-with-output.rst index a0bc6ab8d44384..a9354866a28de0 100644 --- a/docs/notebooks/239-image-bind-convert-with-output.rst +++ b/docs/notebooks/239-image-bind-convert-with-output.rst @@ -1,7 +1,7 @@ Binding multimodal data using ImageBind and OpenVINO ==================================================== -.. _top: + Exploring the surrounding world, people get information using multiple senses, for example, seeing a busy street and hearing the sounds of car @@ -69,6 +69,8 @@ represented on the image below: In this tutorial, we consider how to use ImageBind for multimodal zero-shot classification. +.. _top: + **Table of contents**: - `Prerequisites <#prerequisites>`__ diff --git a/docs/notebooks/240-dolly-2-instruction-following-with-output.rst b/docs/notebooks/240-dolly-2-instruction-following-with-output.rst index bbc1e2401599b8..9b450eb9902ce5 100644 --- a/docs/notebooks/240-dolly-2-instruction-following-with-output.rst +++ b/docs/notebooks/240-dolly-2-instruction-following-with-output.rst @@ -1,7 +1,7 @@ Instruction following using Databricks Dolly 2.0 and OpenVINO ============================================================= -.. _top: + The instruction following is one of the cornerstones of the current generation of large language models(LLMs). Reinforcement learning with @@ -82,6 +82,8 @@ post `__ +.. _top: + **Table of contents**: - `Prerequisites <#prerequisites>`__ diff --git a/docs/notebooks/241-riffusion-text-to-music-with-output.rst b/docs/notebooks/241-riffusion-text-to-music-with-output.rst index cae9b6e81d19d0..d8eb9cb1462095 100644 --- a/docs/notebooks/241-riffusion-text-to-music-with-output.rst +++ b/docs/notebooks/241-riffusion-text-to-music-with-output.rst @@ -1,7 +1,7 @@ Text-to-Music generation using Riffusion and OpenVINO ===================================================== -.. _top: + `Riffusion `__ is a latent text-to-image diffusion model capable of generating spectrogram @@ -76,6 +76,8 @@ The STFT is invertible, so the original audio can be reconstructed from a spectrogram. This idea is a behind approach to using Riffusion for audio generation. +.. _top: + **Table of contents**: - `Prerequisites <#prerequisites>`__ diff --git a/docs/notebooks/242-freevc-voice-conversion-with-output.rst b/docs/notebooks/242-freevc-voice-conversion-with-output.rst index 99396b5017a3f8..0a372bf31c85b7 100644 --- a/docs/notebooks/242-freevc-voice-conversion-with-output.rst +++ b/docs/notebooks/242-freevc-voice-conversion-with-output.rst @@ -1,7 +1,7 @@ High-Quality Text-Free One-Shot Voice Conversion with FreeVC and OpenVINO™ ========================================================================== -.. _top: + `FreeVC `__ allows alter the voice of a source speaker to a target style, while keeping the linguistic content @@ -30,6 +30,8 @@ devices. It consists of the following steps: - Convert models to OpenVINO Intermediate Representation. - Inference using only OpenVINO’s IR models. +.. _top: + **Table of contents**: - `Prerequisites <#prerequisites>`__ diff --git a/docs/notebooks/243-tflite-selfie-segmentation-with-output.rst b/docs/notebooks/243-tflite-selfie-segmentation-with-output.rst index af045b0b71759c..f989c47a6e0bd8 100644 --- a/docs/notebooks/243-tflite-selfie-segmentation-with-output.rst +++ b/docs/notebooks/243-tflite-selfie-segmentation-with-output.rst @@ -1,7 +1,7 @@ Selfie Segmentation using TFLite and OpenVINO ============================================= -.. _top: + The Selfie segmentation pipeline allows developers to easily separate the background from users within a scene and focus on what matters. @@ -36,6 +36,8 @@ The tutorial consists of following steps: 2. Run inference on the image. 3. Run interactive background blurring demo on video. +.. _top: + **Table of contents**: - `Prerequisites <#prerequisites>`__ diff --git a/docs/notebooks/244-named-entity-recognition-with-output.rst b/docs/notebooks/244-named-entity-recognition-with-output.rst index 40dcb1455d73b3..dd6af58fd7bc13 100644 --- a/docs/notebooks/244-named-entity-recognition-with-output.rst +++ b/docs/notebooks/244-named-entity-recognition-with-output.rst @@ -1,7 +1,7 @@ Named entity recognition with OpenVINO™ ======================================= -.. _top: + The Named Entity Recognition(NER) is a natural language processing method that involves the detecting of key information in the @@ -27,6 +27,8 @@ To simplify the user experience, the `Hugging Face Optimum `__ library is used to convert the model to OpenVINO™ IR format and quantize it. +.. _top: + **Table of contents**: - `Prerequisites <#prerequisites>`__ diff --git a/docs/notebooks/248-stable-diffusion-xl-with-output.rst b/docs/notebooks/248-stable-diffusion-xl-with-output.rst index 457c66ce5399b9..594fb4f1a7b6e7 100644 --- a/docs/notebooks/248-stable-diffusion-xl-with-output.rst +++ b/docs/notebooks/248-stable-diffusion-xl-with-output.rst @@ -1,7 +1,7 @@ Image generation with Stable Diffusion XL and OpenVINO ====================================================== -.. _top: + Stable Diffusion XL or SDXL is the latest image generation model that is tailored towards more photorealistic outputs with more detailed imagery @@ -67,6 +67,8 @@ The tutorial consists of the following steps: Some demonstrated models can require at least 64GB RAM for conversion and running. +.. _top: + **Table of contents**: - `Install Prerequisites <#install-prerequisites>`__ diff --git a/docs/notebooks/250-music-generation-with-output.rst b/docs/notebooks/250-music-generation-with-output.rst index 89894e4d0d6eac..564fe33f99f9e1 100644 --- a/docs/notebooks/250-music-generation-with-output.rst +++ b/docs/notebooks/250-music-generation-with-output.rst @@ -1,7 +1,7 @@ Controllable Music Generation with MusicGen and OpenVINO ======================================================== -.. _top: + MusicGen is a single-stage auto-regressive Transformer model capable of generating high-quality music samples conditioned on text descriptions @@ -32,6 +32,8 @@ We will use a model implementation from the `Hugging Face Transformers `__ library. +.. _top: + **Table of contents**: - `Requirements and Imports <#prerequisites>`__ diff --git a/docs/notebooks/251-tiny-sd-image-generation-with-output.rst b/docs/notebooks/251-tiny-sd-image-generation-with-output.rst index f8043dfe5526bb..b2afd5f5c58864 100644 --- a/docs/notebooks/251-tiny-sd-image-generation-with-output.rst +++ b/docs/notebooks/251-tiny-sd-image-generation-with-output.rst @@ -1,7 +1,7 @@ Image Generation with Tiny-SD and OpenVINO™ =========================================== -.. _top: + In recent times, the AI community has witnessed a remarkable surge in the development of larger and more performant language models, such as @@ -41,7 +41,9 @@ The notebook contains the following steps: 3. Run Inference pipeline with OpenVINO. 4. Run Interactive demo for Tiny-SD model -**Table of content**: +.. _toc: + +**Table of contents**: - `Prerequisites <#prerequisites>`__ - `Create PyTorch Models pipeline <#create-pytorch-models-pipeline>`__ diff --git a/docs/notebooks/252-fastcomposer-image-generation-with-output.rst b/docs/notebooks/252-fastcomposer-image-generation-with-output.rst index 891e1dd364663c..d0c9a479aa0f06 100644 --- a/docs/notebooks/252-fastcomposer-image-generation-with-output.rst +++ b/docs/notebooks/252-fastcomposer-image-generation-with-output.rst @@ -1,7 +1,7 @@ `FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention `__ ===================================================================================================================== -.. _top: + FastComposer uses subject embeddings extracted by an image encoder to augment the generic text conditioning in diffusion models, enabling @@ -32,6 +32,8 @@ different styles, actions, and contexts. drivers in the system - changes to have compatibility with transformers >= 4.30.1 (due to security vulnerability) +.. _top: + **Table of contents**: - `Install Prerequisites <#install-prerequisites>`__ diff --git a/docs/notebooks/253-zeroscope-text2video-with-output.rst b/docs/notebooks/253-zeroscope-text2video-with-output.rst index 4a538a6a8fc401..549a1ce04e5bfa 100644 --- a/docs/notebooks/253-zeroscope-text2video-with-output.rst +++ b/docs/notebooks/253-zeroscope-text2video-with-output.rst @@ -1,7 +1,7 @@ Video generation with ZeroScope and OpenVINO ============================================ -.. _top: + The ZeroScope model is a free and open-source text-to-video model that can generate realistic and engaging videos from text descriptions. It is @@ -34,6 +34,8 @@ Both versions of the ZeroScope model are available on Hugging Face: We will use the first one. +.. _top: + **Table of contents**: - `Install and import required packages <#install-and-import-required-packages>`__ diff --git a/docs/notebooks/301-tensorflow-training-openvino-nncf-with-output.rst b/docs/notebooks/301-tensorflow-training-openvino-nncf-with-output.rst index f344a1ff9242ad..bce43dba73904d 100644 --- a/docs/notebooks/301-tensorflow-training-openvino-nncf-with-output.rst +++ b/docs/notebooks/301-tensorflow-training-openvino-nncf-with-output.rst @@ -11,6 +11,8 @@ A custom dataloader and metric will be defined, and accuracy and performance will be computed for the original IR model and the quantized model. +.. _top: + **Table of contents**: - `Preparation <#preparation>`__ diff --git a/docs/notebooks/301-tensorflow-training-openvino-with-output.rst b/docs/notebooks/301-tensorflow-training-openvino-with-output.rst index 1f854249fd87d8..6bb74ee2b89df4 100644 --- a/docs/notebooks/301-tensorflow-training-openvino-with-output.rst +++ b/docs/notebooks/301-tensorflow-training-openvino-with-output.rst @@ -1,6 +1,8 @@ From Training to Deployment with TensorFlow and OpenVINO™ ========================================================= + + .. _top: **Table of contents**: diff --git a/docs/notebooks/302-pytorch-quantization-aware-training-with-output.rst b/docs/notebooks/302-pytorch-quantization-aware-training-with-output.rst index ecaad0a545e8c4..0448bcbd2c1793 100644 --- a/docs/notebooks/302-pytorch-quantization-aware-training-with-output.rst +++ b/docs/notebooks/302-pytorch-quantization-aware-training-with-output.rst @@ -1,7 +1,7 @@ Quantization Aware Training with NNCF, using PyTorch framework ============================================================== -.. _top: + This notebook is based on `ImageNet training in PyTorch `__. @@ -34,6 +34,8 @@ hub `__. This notebook requires a C++ compiler. +.. _top: + **Table of contents**: - `Imports and Settings <#imports-and-settings>`__ diff --git a/docs/notebooks/305-tensorflow-quantization-aware-training-with-output.rst b/docs/notebooks/305-tensorflow-quantization-aware-training-with-output.rst index b9bbf0325d64e3..7d6e7934675d69 100644 --- a/docs/notebooks/305-tensorflow-quantization-aware-training-with-output.rst +++ b/docs/notebooks/305-tensorflow-quantization-aware-training-with-output.rst @@ -1,7 +1,7 @@ Quantization Aware Training with NNCF, using TensorFlow Framework ================================================================= -.. _top: + The goal of this notebook to demonstrate how to use the Neural Network Compression Framework `NNCF `__ @@ -23,6 +23,8 @@ Imagenette is a subset of 10 easily classified classes from the ImageNet dataset. Using the smaller model and dataset will speed up training and download time. +.. _top: + **Table of contents**: - `Imports and Settings <#imports-and-settings>`__ diff --git a/docs/notebooks/401-object-detection-with-output.rst b/docs/notebooks/401-object-detection-with-output.rst index f1a947ae961afa..da6f2e47f99c40 100644 --- a/docs/notebooks/401-object-detection-with-output.rst +++ b/docs/notebooks/401-object-detection-with-output.rst @@ -1,7 +1,7 @@ Live Object Detection with OpenVINO™ ==================================== -.. _top: + This notebook demonstrates live object detection with OpenVINO, using the `SSDLite @@ -17,6 +17,8 @@ Additionally, you can also upload a video file. with a webcam. If you run the notebook on a server, the webcam will not work. However, you can still do inference on a video. +.. _top: + **Table of contents**: - `Preparation <#preparation>`__ diff --git a/docs/notebooks/402-pose-estimation-with-output.rst b/docs/notebooks/402-pose-estimation-with-output.rst index fbee0c5e4708fa..efe0ffcdd5564c 100644 --- a/docs/notebooks/402-pose-estimation-with-output.rst +++ b/docs/notebooks/402-pose-estimation-with-output.rst @@ -1,7 +1,7 @@ Live Human Pose Estimation with OpenVINO™ ========================================= -.. _top: + This notebook demonstrates live pose estimation with OpenVINO, using the OpenPose @@ -18,6 +18,8 @@ Additionally, you can also upload a video file. work. However, you can still do inference on a video in the final step. +.. _top: + **Table of contents**: - `Imports <#imports>`__ diff --git a/docs/notebooks/403-action-recognition-webcam-with-output.rst b/docs/notebooks/403-action-recognition-webcam-with-output.rst index d0cb4b74b57b00..d6755518701ca1 100644 --- a/docs/notebooks/403-action-recognition-webcam-with-output.rst +++ b/docs/notebooks/403-action-recognition-webcam-with-output.rst @@ -1,7 +1,7 @@ Human Action Recognition with OpenVINO™ ======================================= -.. _top: + This notebook demonstrates live human action recognition with OpenVINO, using the `Action Recognition @@ -39,6 +39,8 @@ Transformer and `ResNet34 `__. +.. _top: + **Table of contents**: - `Imports <#imports>`__ diff --git a/docs/notebooks/404-style-transfer-with-output.rst b/docs/notebooks/404-style-transfer-with-output.rst index 7c5d9c1022830d..630aca385b8d84 100644 --- a/docs/notebooks/404-style-transfer-with-output.rst +++ b/docs/notebooks/404-style-transfer-with-output.rst @@ -1,7 +1,7 @@ Style Transfer with OpenVINO™ ============================= -.. _top: + This notebook demonstrates style transfer with OpenVINO, using the Style Transfer Models from `ONNX Model @@ -32,6 +32,8 @@ Additionally, you can also upload a video file. but you can run inference, using a video file. +.. _top: + **Table of contents**: - `Preparation <#preparation>`__ diff --git a/docs/notebooks/405-paddle-ocr-webcam-with-output.rst b/docs/notebooks/405-paddle-ocr-webcam-with-output.rst index 608a9d4ab58f4d..8f11e078ae975e 100644 --- a/docs/notebooks/405-paddle-ocr-webcam-with-output.rst +++ b/docs/notebooks/405-paddle-ocr-webcam-with-output.rst @@ -1,7 +1,7 @@ PaddleOCR with OpenVINO™ ======================== -.. _top: + This demo shows how to run PP-OCR model on OpenVINO natively. Instead of exporting the PaddlePaddle model to ONNX and then converting to the @@ -25,6 +25,8 @@ the PaddleOCR is as follows: with a webcam. If you run the notebook on a server, the webcam will not work. You can still do inference on a video file. +.. _top: + **Table of contents**: - `Imports <#imports>`__ diff --git a/docs/notebooks/406-3D-pose-estimation-with-output.rst b/docs/notebooks/406-3D-pose-estimation-with-output.rst index 2221e0945429eb..4cef53e5c7fb38 100644 --- a/docs/notebooks/406-3D-pose-estimation-with-output.rst +++ b/docs/notebooks/406-3D-pose-estimation-with-output.rst @@ -1,7 +1,7 @@ Live 3D Human Pose Estimation with OpenVINO =========================================== -.. _top: + This notebook demonstrates live 3D Human Pose Estimation with OpenVINO via a webcam. We utilize the model @@ -30,6 +30,8 @@ To ensure that the results are displayed correctly, run the code in a recommended browser on one of the following operating systems: Ubuntu, Windows: Chrome, macOS: Safari. +.. _top: + **Table of contents**: - `Prerequisites <#prerequisites>`__ diff --git a/docs/notebooks/407-person-tracking-with-output.rst b/docs/notebooks/407-person-tracking-with-output.rst index ed99df703473b8..9e11051c3f436c 100644 --- a/docs/notebooks/407-person-tracking-with-output.rst +++ b/docs/notebooks/407-person-tracking-with-output.rst @@ -1,7 +1,7 @@ Person Tracking with OpenVINO™ ============================== -.. _top: + This notebook demonstrates live person tracking with OpenVINO: it reads frames from an input video sequence, detects people in the frames, @@ -95,6 +95,8 @@ realtime tracking,” in ICIP, 2016, pp. 3464–3468. .. |deepsort| image:: https://user-images.githubusercontent.com/91237924/221744683-0042eff8-2c41-43b8-b3ad-b5929bafb60b.png +.. _top: + **Table of contents**: - `Imports <#imports>`__ diff --git a/docs/tutorials.md b/docs/tutorials.md index a4fa0ed98cb073..c21005bab47bd4 100644 --- a/docs/tutorials.md +++ b/docs/tutorials.md @@ -131,6 +131,15 @@ Tutorials that explain how to optimize and quantize models with OpenVINO tools. +----------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+ | `120-tensorflow-object-detection-to-openvino `__ |br| |n120| |br| |c120| | Convert TensorFlow Object Detection models to OpenVINO IR | +----------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+ + | `122-speech-recognition-quantization-wav2vec2 `__ | Quantize Speech Recognition Models with accuracy control using NNCF PTQ API. | + +----------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+ + | `122-yolov8-quantization-with-accuracy-control `__ | Convert and Optimize YOLOv8 with OpenVINO™. | + +----------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+ + + + + + Model Demos