-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compiling yolov5 #253
Comments
Managed to get some of it to compile by adding the following code :
But I'm not sure if its the way to go. I tried to figure out what was causing problem with the whole model compilation, but couldn't find it. Here is the full trace of the model I managed to compile:
|
So I got Yolov5-x to run on an inf1 instance and did some speed test to see how it compares. This is some really basic test with a single image ran 10 time, here are my results on cpu (c5-xlarge), a 1080ti (my own computer) and an inf1-xlarge instance : Here is the trace of the model compilation :
|
Hi Marc, thanks for all the good inputs. As you noted, the model runs successfully when excluding the Detect layer. We expect our upcoming Neuron release to improve performance, and will update back when released so you can run the test again. |
@aws-zejdaj thanks! Do you know this if model will 100% compile ? Also curious to know whats not compatible right now :) |
@aws-zejdaj @Ownmarc thanks for your efforts on getting YOLOv5 to operate correctly on the AWS Inferentia chips! If I can be of any help let me know, I'm the primary maintainer at https://github.com/ultralytics/yolov5. If you have any specific feedback about the Detect() layer regarding what is causing incompatibilities we may be able to feed this info into future YOLOv5 design decisions as well, as part of our goal is ease of YOLOv5 deployment across the largest addressable markets. To provide additional info, the YOLOv5 Detect() layer is the very last layer which combines multiple heads (P3/8-small, P4/16-medium, P5/32-large and optionally P6/64-xlarge) into a single output. The source is here. Note that this layer behaves differently during training and deployment, and that a YOLOv5 model.fuse() op will fuse batchnorm and nn.Conv2d() layers together typically before compilation. If we wanted to incorporate torch.neuron export capability @Ownmarc then we could also update export.py for this. class Detect(nn.Module):
stride = None # strides computed during build
export = False # onnx export
def __init__(self, nc=80, anchors=(), ch=()): # detection layer
super(Detect, self).__init__()
self.nc = nc # number of classes
self.no = nc + 5 # number of outputs per anchor
self.nl = len(anchors) # number of detection layers
self.na = len(anchors[0]) // 2 # number of anchors
self.grid = [torch.zeros(1)] * self.nl # init grid
a = torch.tensor(anchors).float().view(self.nl, -1, 2)
self.register_buffer('anchors', a) # shape(nl,na,2)
self.register_buffer('anchor_grid', a.clone().view(self.nl, 1, -1, 1, 1, 2)) # shape(nl,1,na,1,1,2)
self.m = nn.ModuleList(nn.Conv2d(x, self.no * self.na, 1) for x in ch) # output conv
def forward(self, x):
# x = x.copy() # for profiling
z = [] # inference output
self.training |= self.export
for i in range(self.nl):
x[i] = self.m[i](x[i]) # conv
bs, _, ny, nx = x[i].shape # x(bs,255,20,20) to x(bs,3,20,20,85)
x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()
if not self.training: # inference
if self.grid[i].shape[2:4] != x[i].shape[2:4]:
self.grid[i] = self._make_grid(nx, ny).to(x[i].device)
y = x[i].sigmoid()
y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i] # xy
y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i] # wh
z.append(y.view(bs, -1, self.no))
return x if self.training else (torch.cat(z, 1), x)
@staticmethod
def _make_grid(nx=20, ny=20):
yv, xv = torch.meshgrid([torch.arange(ny), torch.arange(nx)])
return torch.stack((xv, yv), 2).view((1, 1, ny, nx, 2)).float() |
Glenn, we'll be happy to discuss the best handling of the Detect layer, feel free to reach to us at [email protected] |
Hi Glenn, just want to let you know that this is a high priority item and we are actively working on this issue. Thanks! |
@aws-zejdaj that's great news! We welcome PRs, so if you discover any useful improvements to the codebase at https://github.com/ultralytics/yolov5 we'd be happy to review and and integrate there also. |
@Ownmarc @aws-zejdaj @aws-renfu good news 😃! This issue may now been fixed ✅ in PR #ultralytics/yolov5#2953. To receive this update you can:
Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀! |
Nice @glenn-jocher ! Thanks |
Closing this ticket since the latest torch neuron release supports the ultralytics Yolo v5 model - https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/torch-neuron/torch-neuron.html#pytorch-neuron-rn |
…ons (aws-neuron#253) * Add note for aws-neuron-dkms install to address multiple kernel versions * fix troubleshooting guide reference * clarify how to check if dkms is installed for the current kernel * clarify dkms dependency on linux kernel
…ons (#253) * Add note for aws-neuron-dkms install to address multiple kernel versions * fix troubleshooting guide reference * clarify how to check if dkms is installed for the current kernel * clarify dkms dependency on linux kernel
Hi, I'm trying to replicate the steps indicated here to convert YoloV5s to neuron in inf1. I am using Ubuntu 18.04 DLAMI. Activate the
But this gives me the following error:
Could you please guide me through how to perform the conversion for deploying it on Inf1? @Ownmarc Thanks, |
@diazGT94, I think I had this bug too when I was running Try this :
|
@Ownmarc Thanks the code you provide me helped me. With the new upgrades from YoloV5 side is still required to define this and perform the conversion? I just noticed that when the function below is not included in the conversion the detections doesn't perform well even if the parameters of the NMS are modified.
|
Hello, until a couple of weeks ago I was able to use the following structure to convert my custom models to neuron
However, today I tried to do it again but I got the following error:
I can overcome the error by not parsing the subgraph_builder_function argument to the neuron trace function. However, when I tried to make the inference with a model created with this "hack" the predicted results had a terrible performance compared to the ones I used to had when I pass that argument. Should I downgrade my torch.neuron to get the previous results? |
@diazGT94 this is likely related to ultralytics/yolov5#5845 from 5 days ago. A temporary workaround might be to drop down a level, i.e. use model.model instead of model, but I'll think of a better long term solution to revert the behavior to the original behavior. |
@diazGT94 I was actually able to convert by running:
|
Hi! I was able to convert the model from yolov5 to neuron with the follow code:
Neuron Yolov5 Model:
Yolov5:
Inference script:
Is there something wrong when converting the model or running inference? The label and also the acc seems to be same as the expected, but tensors not. |
Sorry @josebenitezg, we did not notice this new issue as it was posted in a closed issue. I have gone ahead and create a new github issue for you: #435 . |
…ons (#253) * Add note for aws-neuron-dkms install to address multiple kernel versions * fix troubleshooting guide reference * clarify how to check if dkms is installed for the current kernel * clarify dkms dependency on linux kernel
Hey, I am looking to run yolov5 https://github.com/ultralytics/yolov5 model on an inf1 instance for inference.
I am first trying to get the original Coco model to compile but hitting the following error. I have followed many aws tutorials (yolov4 and resnet) and trying to compile on a c5-xlarge instance (4 cpu with 8gb of ram) using the ubuntu 18 DLAMI in the aws_neuron_pytorch_p36 python env.
One thing I noticed is that the neuron-cc requires numpy <= 1.18.4 while yolov5 requires numpy >= 1.18.5 I first made sure the model would run correctly by updating numpy to 1.18.5 and then downgraded numpy to 1.18.4 per neuron-cc requirement before compiling/converting the model.
Not exactly sure where to look to debug this (if at all possible) and would welcome any hint.
fake_image = torch.zeros([1, 3, 608, 608], dtype=torch.float32)
here is the output of
torch.neuron.analyze_model(model, example_inputs=[fake_image])
and then I run the compiling function
model_neuron = torch.neuron.trace(model, example_inputs=[fake_image], compiler_args="-O2")
that gives an error with the following trace :The text was updated successfully, but these errors were encountered: