-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
online-deployment issue "The Managed Inference service creation is taking longer than our normal time" #4088
Comments
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @azureml-github. Issue Detailsaz --version az extension list[ Description of issue (in as much detail as possible)Command:- az ml online-deployment create --name blue --endpoint advanced-ner-endpoint-demo -f deployment.yml --all-traffic
|
route to service team |
Acknowledged - beginning investigation and will reach out for more information if needed. |
Hi @bhushangholave! We found an error stating that the operation timed out waiting for an image build to complete. If you check the Azure portal, are you able to find an image build error under operation details? |
AzureML doesn't have access to the image build log because of the privacy. Though it is available for the user from the storage associated with the workspace or from the container registry associated with user workspace (ACR task runs). Timeout issue from the image build usually means poor environment specification, and pip fails to resolve the conflicts in user dependencies with a reasonable time. Solution from the user end would be to inspect the dependencies and fix conflicts. Should be easily repro-ed locally by materializing environment from the conda spec. Temporary workaround that might work is to pin pip<=20.2.4 to use older resolver that ignores some of the conflicts |
@vs-li and @vizhur thank you for your comments.
|
@vs-li In Azure portal Environment tab it is still stuck at installing pip dependencies. |
@bhushangholave you have dependencies conflict. Unfortunately conda swallows pip's log and you cannot see pips struggles. if you pip install -r requirements.txt with those deps with the latest pip you should see where pip gets stuck. There are a bunch of tuckets for pip for those kind of issues, feel free to file one too. just to unblock yourself you can try to pin pip<=20.2.4 to use old resolver that ignores some of the conflicts |
@vizhur you are right, corrected my dependencies file and it worked. But portal or deployment creation needs to throw correct error and logs for debug. |
@bhushangholave Since the work around helped. Closing this thread for now. |
az --version
azure-cli 2.29.0 *
az extension list
[
{
"experimental": false,
"extensionType": "whl",
"name": "ml",
"path": "/home/icertisadmin/.azure/cliextensions/ml",
"preview": true,
"version": "2.0.2"
}
]
Description of issue (in as much detail as possible)
Command:- az ml online-deployment create --name blue --endpoint advanced-ner-endpoint-demo -f deployment.yml --all-traffic
Exception details
Creating or updating online_deployments
Check: endpoint advanced-endpoint exists
The deployment request bhushan-workspace-advanced-ner-endpoint-demo-8612672 was accepted. ARM deployment URI for reference:
https://ms.portal.azure.com/#blade/HubsExtension/DeploymentDetailsBlade/overview/id/%2Fsubscriptions%2F2b044625-9119-453a-8f50-53426430883b%2
Registering model version (ad2f42d3-xxxx-xxxx-be9f-2e20ef135c16 1 ) Done (1s)
Registering code version (c665e43f-xxxx-xxxx-a98d-a84911248e91 1 ) Done (1s)
Registering environment version (32890045-xxxx-48d3-a43c-6f2ec0323d08 1 ) Done (6s)
Creating deployment blue ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Code: ResourceDeploymentFailure
Message: The resource operation completed with terminal provisioning state 'Failed'.
Exception Details: (DeploymentTimedOut) The Managed Inference service creation is taking longer than our normal time.
The text was updated successfully, but these errors were encountered: