Inference works on ml.inf1.xlarge but fails on ml.inf1.24xlarge with ""The PyTorch Neuron Runtime could not be initialized"" #471

aj2622 · 2022-08-18T21:08:28Z

deployment code

from sagemaker.huggingface.model import HuggingFaceModel

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data=model_data,       # path to your model and script
   role=role,                    # iam role with permissions to create an Endpoint
   image_uri='763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference-neuron:1.10.2-transformers4.20.1-neuron-py37-sdk1.19.1-ubuntu18.04'
)

# Let SageMaker know that we've already compiled the model via neuron-cc
huggingface_model._is_compiled_model = True

# deploy the endpoint endpoint
predictor = huggingface_model.deploy(
    initial_instance_count=1,      # number of instances
    instance_type="ml.inf1.24xlarge" # AWS Inferentia Instance
)

when I am using inf1.xlarge my endpoints works as expected. the moment I switch to ml.inf1.24xlarge or ml.inf1.6xlarge or ml.inf1.2xlarge I get hit with the following error.

What am I missing here ?

The text was updated successfully, but these errors were encountered:

aj2622 · 2022-08-18T21:14:24Z

Incase its important, this is how I am loading the model

and this is how i traced it

The model is layoutLM from hugging face

aj2622 · 2022-08-23T00:58:49Z

I was able to fix this by limiting the number of model workers to number of neuron cores (I was over assigning)

aws-taylor · 2022-08-25T17:10:20Z

Hello @aj2622,

We're working on a pull request to model-model-server to help avoid this failure mode in the future - awslabs/multi-model-server#1002

aj2622 changed the title ~~Inference works on ml.inf1.xlarge but fails on ml.inf1.xlarge with ""The PyTorch Neuron Runtime could not be initialized""~~ Inference works on ml.inf1.xlarge but fails on ml.inf1.24xlarge with ""The PyTorch Neuron Runtime could not be initialized"" Aug 18, 2022

aj2622 closed this as completed Aug 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference works on ml.inf1.xlarge but fails on ml.inf1.24xlarge with ""The PyTorch Neuron Runtime could not be initialized"" #471

Inference works on ml.inf1.xlarge but fails on ml.inf1.24xlarge with ""The PyTorch Neuron Runtime could not be initialized"" #471

aj2622 commented Aug 18, 2022

aj2622 commented Aug 18, 2022 •

edited

Loading

aj2622 commented Aug 23, 2022

aws-taylor commented Aug 25, 2022

Inference works on ml.inf1.xlarge but fails on ml.inf1.24xlarge with ""The PyTorch Neuron Runtime could not be initialized"" #471

Inference works on ml.inf1.xlarge but fails on ml.inf1.24xlarge with ""The PyTorch Neuron Runtime could not be initialized"" #471

Comments

aj2622 commented Aug 18, 2022

aj2622 commented Aug 18, 2022 • edited Loading

aj2622 commented Aug 23, 2022

aws-taylor commented Aug 25, 2022

aj2622 commented Aug 18, 2022 •

edited

Loading