Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Integrate Databricks SDK with Model Serving Auth Provider #761

Merged

Conversation

aravind-segu
Copy link
Contributor

Changes

This PR introduces a new model serving auth method to Databricks SDK.

  • If the correct environment variables are set to identify a model serving environment
  • Check to see if there is an oauth file written by the serving environment
  • If this file exists use the token here for authentication

Tests

Added Unit tests

  • make test run locally
  • make fmt applied
  • relevant integration tests applied

@aravind-segu aravind-segu changed the title Integrate Databricks SDK with Model Serving Auth Provider [Internal] Integrate Databricks SDK with Model Serving Auth Provider Sep 13, 2024
@aravind-segu aravind-segu reopened this Sep 13, 2024
Copy link

@smurching smurching left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, thanks @aravind-segu !

Copy link
Contributor

@mgyucht mgyucht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looking good, but we shouldn't throw exceptions from the credential strategy, otherwise DefaultCredentials will stop trying later auth mechanisms. This will be problematic if new strategies are added later. Otherwise, a few nits.

Were you able to test this out with a manual test?

try:
host, token = ModelServingAuthProvider.get_databricks_host_token()
if token is None:
raise ValueError("Unable to Authenticate using Model Serving Environment")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just return None here if you're unable to authenticate. DefaultCredentials doesn't catch these exceptions, so this will interrupt the fallback flow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am raising an error here, but then catching it immediately and returning None after. I wanted to add an extra message to differentiate whether reading the token failed or whether the token read was itself none.

databricks/sdk/credentials_provider.py Outdated Show resolved Hide resolved
databricks/sdk/credentials_provider.py Show resolved Hide resolved
databricks/sdk/credentials_provider.py Outdated Show resolved Hide resolved
databricks/sdk/credentials_provider.py Show resolved Hide resolved
databricks/sdk/credentials_provider.py Outdated Show resolved Hide resolved

def inner() -> Dict[str, str]:
# Call here again to get the refreshed token
host, token = ModelServingAuthProvider.get_databricks_host_token()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is called with every request. Do we need to check the file every time? We incur the cost of opening a file & reading its contents on every request. Is there an expiry included in the token file? We could use that to determine whether we need to refresh the token.

I tested a simple benchmark:

import json
import timeit
import os

with open('sample.json', 'w') as f:
    json.dump([{'key': 'value'} for _ in range(100)], f)

def benchmark():
    with open('sample.json', 'r') as f:
        data = json.load(f)

print(f"Average time to open, read, and parse file: {timeit.timeit(benchmark, number=1000):.6f} seconds")

This came out to 0.07 seconds on my laptop. It might be nice to avoid this overhead, especially for latency-sensitive use-cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, added a 5 min cache

databricks/sdk/credentials_provider.py Outdated Show resolved Hide resolved
@aravind-segu
Copy link
Contributor Author

Mostly looking good, but we shouldn't throw exceptions from the credential strategy, otherwise DefaultCredentials will stop trying later auth mechanisms. This will be problematic if new strategies are added later. Otherwise, a few nits.

Were you able to test this out with a manual test?

Manual Test Endpoint Here: https://e2-dogfood.staging.cloud.databricks.com/ml/endpoints/agents_main-default-test_agent_09_16?o=6051921418418893

@aravind-segu
Copy link
Contributor Author

@mgyucht We discussed in the call about making this a oauth-credential-provider. However that requires a refresht_token we do not have. We can set that to None but what happens then when the expiry is hit.

In terms of direct ingress, I do not think we are going to be hitting Model Serving Direct Ingress Endpoints right now. The only thing that we could do is talk to a Foundational Model Serving Endpoint through the workspace client. Currently we already do that in our Mlflow code by sending a request directly.

WIth this change we will now start to use databricks sdk workspace client to make this call. We will use the code here. As it already worked before, I am assuming that it will work again as this directly calls the api client. Do we still need to integrate oauth-credential-provider now.

model_serving_auth_provider = ModelServingAuthProvider()
host, token = model_serving_auth_provider.get_databricks_host_token()
if token is None:
raise ValueError("Unable to Authenticate using Model Serving Environment since the Token is None")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't raise here right?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see we catch it further down

Copy link

@smurching smurching left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last comments, then LGTM!

Copy link

@smurching smurching left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@mgyucht mgyucht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close, just a few small comments.

databricks/sdk/credentials_provider.py Outdated Show resolved Hide resolved
databricks/sdk/credentials_provider.py Show resolved Hide resolved
databricks/sdk/credentials_provider.py Outdated Show resolved Hide resolved
databricks/sdk/credentials_provider.py Outdated Show resolved Hide resolved
databricks/sdk/credentials_provider.py Outdated Show resolved Hide resolved
@aravind-segu aravind-segu force-pushed the aravind-segu/ModelServingAuthProvider branch 5 times, most recently from d001f9f to d24912b Compare September 18, 2024 16:35
Signed-off-by: aravind-segu <[email protected]>
Signed-off-by: aravind-segu <[email protected]>
Signed-off-by: aravind-segu <[email protected]>
Signed-off-by: aravind-segu <[email protected]>
Signed-off-by: aravind-segu <[email protected]>
Signed-off-by: aravind-segu <[email protected]>
Signed-off-by: aravind-segu <[email protected]>
Signed-off-by: aravind-segu <[email protected]>
Signed-off-by: aravind-segu <[email protected]>
Signed-off-by: aravind-segu <[email protected]>
Signed-off-by: aravind-segu <[email protected]>
@aravind-segu aravind-segu force-pushed the aravind-segu/ModelServingAuthProvider branch from d24912b to d4641b6 Compare September 18, 2024 16:36
@aravind-segu
Copy link
Contributor Author

jenkins merge

@mgyucht mgyucht changed the title [Internal] Integrate Databricks SDK with Model Serving Auth Provider [Feature] Integrate Databricks SDK with Model Serving Auth Provider Sep 18, 2024
@mgyucht mgyucht added this pull request to the merge queue Sep 18, 2024
Merged via the queue into databricks:main with commit 9d39254 Sep 18, 2024
15 checks passed
renaudhartert-db added a commit that referenced this pull request Sep 19, 2024
### New Features and Improvements

 * Integrate Databricks SDK with Model Serving Auth Provider ([#761](#761)).

### Bug Fixes

 * Add DataPlane docs to the index ([#764](#764)).
 * `mypy` error: Skipping analyzing "google": module is installed, but missing library stubs or py.typed marker ([#769](#769)).
github-merge-queue bot pushed a commit that referenced this pull request Sep 19, 2024
### New Features and Improvements

* Integrate Databricks SDK with Model Serving Auth Provider
([#761](#761)).


### Bug Fixes

* Add DataPlane docs to the index
([#764](#764)).
* `mypy` error: Skipping analyzing "google": module is installed, but
missing library stubs or py.typed marker
([#769](#769)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants