Maybe the cheapest cloud inference option for Yolov5 (AWS Neuron inf1 instance) #2643

Ownmarc · 2021-03-28T20:01:18Z

Been trying the inf1 ec2 instance from AWS with their own Inferentia chips.

https://aws.amazon.com/ec2/instance-types/inf1/

The Yolov5 model doesn't fully compile yet for their accelerated inference chip, but it is still working pretty well. This issue on their repo will be tracking the support of Yolov5 : aws-neuron/aws-neuron-sdk#253

I have done some really basic speed test (the zidane image ran 10 times, batch size = 1) to compare my 1080ti versus 1 neuron core of this Inferentia chip (there is 4 neuron core per chip, so if the inference job can be parallelized, we can divide the inference time by 4 as each neuron core could have its own yolov5 model loaded and infer independently)

I'll keep this updated when they'll update their issue about yolov5 support and maybe I'll make a simple tutorial to show how easy it is to compile and run Yolov5 on their chip.

glenn-jocher · 2021-03-28T20:26:48Z

@Ownmarc that's really interesting!

BTW if the aws inference instances can exploit FP16 inference then they should benefit from the new autocast PR #2641 I made today for PyTorch Hub models. This seemed to cut about 1/3 of the inference time off on a Colab T4. The results.print() method also now displays profiling results:

I think up to now the best performance:cost ratio I've seen is from T4 instances, but I have not tried the inf1-xlarge instances either. Is a 4-GPU/neuron instance the smallest they get?

EDIT: I should clarify, the best performance:cost from enterprise GPUs hosted on the large cloud providers is probably from T4s. The best overall GPU performance per cost would probably be from 1080ti's on consumer clouds like vast.ai.

Ownmarc · 2021-03-28T20:38:17Z

@glenn-jocher from what AWS says, Amazon EC2 Inf1 instances based on AWS Inferentia chips deliver up to 30% higher throughput and up to 45% lower cost per inference than Amazon EC2 G4 instances. Those G4 instance use the T4 gpu. That is mostlikely assuming the model fully compiles for their chip which is currently not the case for yolov5.

Yes the inf1 xlarge is the smallest they offer with this Inferentia chip

glenn-jocher · 2021-03-28T20:52:15Z

@Ownmarc that's interesting! AWS seems to like offering larger chunks of stuff than GCP. The AWS P4 instances are only available as full 8x A100 machines on AWS, whereas on GCP you can get a smaller instance with 1, 2, 4 or 8 A100s.

Are there any special steps you need to get started with the Inf1 instances?

BTW I forgot to tell you, when timing cuda ops its important to make calls to torch.cuda.synchronize() so you don't get incorrect times. We have a helper function that can replace time.time() that does this here:

yolov5/utils/torch_utils.py

Lines 89 to 94 in 2bf34f5

    
           def time_synchronized(): 
        
               # pytorch-accurate time 
        
               if torch.cuda.is_available(): 
        
                   torch.cuda.synchronize() 
        
               return time.time()

This function is always used when profiling (detect.py, pytorch hub, test.py etc.)

Ownmarc · 2021-03-29T02:00:01Z

Yea, so you need to compile the model for their inferencia chip, its pretty easy with the common frameworks (torch/tensorflow/mxnet), there is a couple of tutorials on how to do so.

And yes, I know there is some extra steps required to get the real time taken to make the inference, when they release a version that will get your model to fully compile, I'll give it a shot again and try to get better metrics! :)

glenn-jocher · 2021-03-29T10:13:10Z

@Ownmarc interesting stuff. I responded over at aws-neuron/aws-neuron-sdk#253 to let them know I'm available to offer any help I can.

github-actions · 2021-04-29T00:16:09Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Ownmarc added the question Further information is requested label Mar 28, 2021

jluntamazon mentioned this issue Apr 27, 2021

YOLOv5 AWS Inferentia Inplace compatibility updates #2953

Merged

github-actions bot added the Stale label Apr 29, 2021

github-actions bot closed this as completed May 5, 2021

iann838 mentioned this issue May 20, 2023

New: Support for AWS Neuron Runtime ultralytics/ultralytics#2719

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maybe the cheapest cloud inference option for Yolov5 (AWS Neuron inf1 instance) #2643

Maybe the cheapest cloud inference option for Yolov5 (AWS Neuron inf1 instance) #2643

Ownmarc commented Mar 28, 2021 •

edited

Loading

glenn-jocher commented Mar 28, 2021 •

edited

Loading

Ownmarc commented Mar 28, 2021 •

edited

Loading

glenn-jocher commented Mar 28, 2021 •

edited

Loading

Ownmarc commented Mar 29, 2021

glenn-jocher commented Mar 29, 2021

github-actions bot commented Apr 29, 2021

Maybe the cheapest cloud inference option for Yolov5 (AWS Neuron inf1 instance) #2643

Maybe the cheapest cloud inference option for Yolov5 (AWS Neuron inf1 instance) #2643

Comments

Ownmarc commented Mar 28, 2021 • edited Loading

glenn-jocher commented Mar 28, 2021 • edited Loading

Ownmarc commented Mar 28, 2021 • edited Loading

glenn-jocher commented Mar 28, 2021 • edited Loading

Ownmarc commented Mar 29, 2021

glenn-jocher commented Mar 29, 2021

github-actions bot commented Apr 29, 2021

Ownmarc commented Mar 28, 2021 •

edited

Loading

glenn-jocher commented Mar 28, 2021 •

edited

Loading

Ownmarc commented Mar 28, 2021 •

edited

Loading

glenn-jocher commented Mar 28, 2021 •

edited

Loading