Skip to content

containers/omlmd

Repository files navigation

OCI Artifact for ML model & metadata

Python License Build E2E testing PyPI - Version

Static Badge GitHub Repo stars YouTube Channel Subscribers

This project is a collection of blueprints, patterns and toolchain (in the form of python SDK and CLI) to leverage OCI Artifact and containers for ML model and metadata.

Documentation: https://containers.github.io/omlmd

GitHub repository: https://github.com/containers/omlmd
YouTube video playlist: https://www.youtube.com/watch?v=W4GwIRPXE8E&list=PLdbdefeRIj9SRbg6Hkr15GeyPH0qpk_ww
Pypi distribution: https://pypi.org/project/omlmd

Installation

Tip

We recommend checking out the Getting Started tutorial in the documentation; below instructions are provided for a quick overview.

In your Python environment, use:

pip install omlmd

Push

Store ML model file model.joblib and its metadata in the OCI repository at localhost:8080:

from omlmd.helpers import Helper

omlmd = Helper()
omlmd.push("localhost:8080/matteo/ml-artifact:latest", "model.joblib", name="Model Example", author="John Doe", license="Apache-2.0", accuracy=9.876543210)

Pull

Fetch everything in a single pull:

omlmd.pull(target="localhost:8080/matteo/ml-artifact:latest", outdir="tmp/b")

Or fetch only the ML model assets:

omlmd.pull(target="localhost:8080/matteo/ml-artifact:latest", outdir="tmp/b", media_types=["application/x-mlmodel"])

Custom Pull: just metadata

The features can be composed in order to expose higher lever capabilities, such as retrieving only the metadata informatio. Implementation intends to follow OCI-Artifact convention

md = omlmd.get_config(target="localhost:8080/matteo/ml-artifact:latest")
print(md)

Crawl

Client-side crawling of metadata.

Note: Server-side analogous coming soon/reference in blueprints.

crawl_result = omlmd.crawl([
    "localhost:8080/matteo/ml-artifact:v1",
    "localhost:8080/matteo/ml-artifact:v2",
    "localhost:8080/matteo/ml-artifact:v3"
])

Example query

Demonstrate integration of crawling results with querying (in this case using jQ)

Of the crawled ML OCI artifacts, which one exhibit the max accuracy?

import jq
jq.compile( "max_by(.config.customProperties.accuracy).reference" ).input_text(crawl_result).first()

To be continued...

Don't forget to checkout the documentation website for more information!