Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InferenceModel refactor for compile/export support #77

Open
talmo opened this issue Aug 14, 2024 · 0 comments
Open

InferenceModel refactor for compile/export support #77

talmo opened this issue Aug 14, 2024 · 0 comments

Comments

@talmo
Copy link
Contributor

talmo commented Aug 14, 2024

Overview

We want to be able to compile and export trained inference models for usage outside of sleap-nn.

Background

The logic for inference is broken down into:

  1. Data loading: I/O (VideoReader, LabelsReader)
  2. Data preprocessing: moving to GPU, normalization, batching, etc.
  3. Model forward pass
  4. Postprocessing: peak finding, PAF grouping, etc.

Right now, some of these ops are a bit mixed across Predictor classes and underlying torch.nn.Modules.

In order to best support workflows where we compile/export the final model for inference-only workloads, we need to include steps 2-4 in the inference model itself (as done in core SLEAP).

The reason for this is both for:

  1. Performance: tensor vectorized ops like normalization are much faster on the GPU, and we won't incur overhead of transferring float32 data from CPU. Additionally, inference engines like torch.compile and TensorRT can yield dramatic performance improvements when the system supports it.
  2. Portability: being able to run those ops with an exported artifact without having to ship instructions for pre/post-processing, including implementation-dependent details like we might have in sleap-nn. This will be useful for building web demos, realtime inference and more.

Ultralytics is a gold-standard example of this, where they support a huge number of export formats:

def export_formats():
    """YOLOv8 export formats."""
    import pandas  # scope for faster 'import ultralytics'

    x = [
        ["PyTorch", "-", ".pt", True, True],
        ["TorchScript", "torchscript", ".torchscript", True, True],
        ["ONNX", "onnx", ".onnx", True, True],
        ["OpenVINO", "openvino", "_openvino_model", True, False],
        ["TensorRT", "engine", ".engine", False, True],
        ["CoreML", "coreml", ".mlpackage", True, False],
        ["TensorFlow SavedModel", "saved_model", "_saved_model", True, True],
        ["TensorFlow GraphDef", "pb", ".pb", True, True],
        ["TensorFlow Lite", "tflite", ".tflite", True, False],
        ["TensorFlow Edge TPU", "edgetpu", "_edgetpu.tflite", True, False],
        ["TensorFlow.js", "tfjs", "_web_model", True, False],
        ["PaddlePaddle", "paddle", "_paddle_model", True, True],
        ["NCNN", "ncnn", "_ncnn_model", True, True],
    ]
    return pandas.DataFrame(x, columns=["Format", "Argument", "Suffix", "CPU", "GPU"])

(ref)

Some of these formats can implement more complex ops than others, which would fit our needs.

Our goal will be implement support for:

  • Required: TensorRT, ONNX
  • Nice to have: torch.compile, CoreML, TF SavedModel/GraphDef/Lite/JS, OpenVINO

Likely, we'll need to adapt to the nuances of each inference runtime framework (TensorRT is notoriously picky), which will impose a particular modularization of the inference steps above. Examples of potential pitfalls:

  • Not supporting variable length shapes (meaning we need to implement padding logic)
  • Not supporting autographable ops (TF)
  • Not supporting custom data types (e.g., Inference results data structures #46)
  • Not supporting cropping in the middle of the pipeline (e.g., top-down)

In cases where the framework does support everything, it may be that we need to do it in a particular way for the conversion to work (e.g., sometimes resizing ops support nearest neighbor but not bilinear interpolation mode).

Examples

  • Ultralytics
  • metrabs: SavedModel export lets you do something like this to use a trained model without installing any special dependencies:
    import tensorflow as tf
    import tensorflow_hub as tfhub
    
    model = tfhub.load('https://bit.ly/metrabs_l')
    image = tf.image.decode_jpeg(tf.io.read_file('img/test_image_3dpw.jpg'))
    pred = model.detect_poses(image)
  • tfjs-model/pose-detection: Running the model in the browser in JavaScript via TF.JS
  • wonnx: WebGPU-based ONNX inference with hardware acceleration in the browser (with multiplatform support) -- without CUDA!

PRs

TODO

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant