-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Models in r2 rw support #9490
Models in r2 rw support #9490
Conversation
@shichengzhou-db Thank you for the contribution! Could you fix the following issue(s)? ⚠ DCO checkThe DCO check failed. Please sign off your commit(s) by following the instructions here. See https://github.com/mlflow/mlflow/blob/master/CONTRIBUTING.md#sign-your-work for more details. |
Documentation preview for dc8c6ac will be available here when this CircleCI job completes successfully. More info
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great @shichengzhou-db ! Were you also able to test loading the model that you registered to R2 via e.g. mlflow.pyfunc.load_model
?
@jerrylian-db FYI there were some gotchas we ran into (e.g. needing to use virtual host style addressing) to get reading/writing model artifacts to R2 to work using UC temporary credentials. You might want to work with Shicheng to test that R2 model read/write still work once you have a PR ready to refactor S3ArtifactRepo to share the fast upload/download logic in DatabricksArtifactRepo |
return bucket, path | ||
|
||
def _get_s3_client(self, addressing_style="virtual"): | ||
return super()._get_s3_client(addressing_style=addressing_style) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may need to run black .
to fix lint errors (e.g. having newlines at EOF)
Signed-off-by: shichengzhou-db <[email protected]>
6ff7637
to
28a81f9
Compare
yep loaded the model and did a prediction, works as expected (added this manual test code in the notebook shared with you earlier) |
Signed-off-by: shichengzhou-db <[email protected]>
28a81f9
to
b9c1b4e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@mlflow-automation autoformat |
Signed-off-by: mlflow-automation <[email protected]>
if path.startswith("/"): | ||
path = path[1:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if path.startswith("/"): | |
path = path[1:] | |
path = path.lstrip("/") |
can we use lstrip
here?
# r2 uri format(virtual): r2://<bucket-name>@<account-id>.r2.cloudflarestorage.com/<path> | ||
parsed = urllib.parse.urlparse(uri) | ||
if parsed.scheme != "r2": | ||
raise Exception(f"Not an R2 URI: {uri}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise Exception(f"Not an R2 URI: {uri}") | |
raise MlflowException.invalid_parameter_value(f"Not an R2 URI: {uri}") |
def __init__( | ||
self, artifact_uri, access_key_id=None, secret_access_key=None, session_token=None | ||
): | ||
super().__init__( | ||
artifact_uri, | ||
access_key_id=access_key_id, | ||
secret_access_key=secret_access_key, | ||
session_token=session_token, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to override __init__
?
Signed-off-by: shichengzhou-db <[email protected]> Signed-off-by: mlflow-automation <[email protected]> Co-authored-by: mlflow-automation <[email protected]>
Signed-off-by: shichengzhou-db <[email protected]> Signed-off-by: mlflow-automation <[email protected]> Co-authored-by: mlflow-automation <[email protected]>
Signed-off-by: shichengzhou-db <[email protected]> Signed-off-by: mlflow-automation <[email protected]> Co-authored-by: mlflow-automation <[email protected]> Signed-off-by: Lu Peng <[email protected]>
Related Issues/PRs
N/A
What changes are proposed in this pull request?
How is this patch tested?
Does this PR require documentation update?
Release Notes
Is this a user-facing change?
Support registering model in a R2 Storage backed Databricks UC Catalog.
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/gateway
: AI Gateway service, Gateway client APIs, third-party Gateway integrationsarea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/recipes
: Recipes, Recipe APIs, Recipe configs, Recipe Templatesarea/projects
: MLproject format, project running backendsarea/scoring
: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra
: MLflow Tracking server backendarea/tracking
: Tracking Service, tracking client APIs, autologgingInterface
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportLanguage
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesIntegrations
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrationsHow should the PR be classified in the release notes? Choose one:
rn/breaking-change
- The PR will be mentioned in the "Breaking Changes" sectionrn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/feature
- A new user-facing feature worth mentioning in the release notesrn/bug-fix
- A user-facing bug fix worth mentioning in the release notesrn/documentation
- A user-facing documentation change worth mentioning in the release notes