Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference works on ml.inf1.xlarge but fails on ml.inf1.24xlarge with ""The PyTorch Neuron Runtime could not be initialized"" #471

Closed
aj2622 opened this issue Aug 18, 2022 · 3 comments

Comments

@aj2622
Copy link

aj2622 commented Aug 18, 2022

deployment code

from sagemaker.huggingface.model import HuggingFaceModel

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data=model_data,       # path to your model and script
   role=role,                    # iam role with permissions to create an Endpoint
   image_uri='763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference-neuron:1.10.2-transformers4.20.1-neuron-py37-sdk1.19.1-ubuntu18.04'
)

# Let SageMaker know that we've already compiled the model via neuron-cc
huggingface_model._is_compiled_model = True

# deploy the endpoint endpoint
predictor = huggingface_model.deploy(
    initial_instance_count=1,      # number of instances
    instance_type="ml.inf1.24xlarge" # AWS Inferentia Instance
)

when I am using inf1.xlarge my endpoints works as expected. the moment I switch to ml.inf1.24xlarge or ml.inf1.6xlarge or ml.inf1.2xlarge I get hit with the following error.
image

What am I missing here ?

@aj2622 aj2622 changed the title Inference works on ml.inf1.xlarge but fails on ml.inf1.xlarge with ""The PyTorch Neuron Runtime could not be initialized"" Inference works on ml.inf1.xlarge but fails on ml.inf1.24xlarge with ""The PyTorch Neuron Runtime could not be initialized"" Aug 18, 2022
@aj2622
Copy link
Author

aj2622 commented Aug 18, 2022

Incase its important, this is how I am loading the model
image

and this is how i traced it
image

The model is layoutLM from hugging face

@aj2622
Copy link
Author

aj2622 commented Aug 23, 2022

I was able to fix this by limiting the number of model workers to number of neuron cores (I was over assigning)

@aj2622 aj2622 closed this as completed Aug 23, 2022
@aws-taylor
Copy link
Contributor

Hello @aj2622,

We're working on a pull request to model-model-server to help avoid this failure mode in the future - awslabs/multi-model-server#1002

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants