Segmentation fault during CoreML NMS #752

dlawrences · 2020-07-06T19:24:31Z

Hi team

I have created a model using the coremltools API. This takes the detections done by a YOLOv5 CNN (https://github.com/ultralytics/yolov5) converted from PyTorch to CoreML and does the following operations:

filters out predictions with objectness < 0.1
concatenates all the remaining predictions from different feature maps (generated for my case are 20x20, 40x40, 80x80)
slices xywh from the 8-dimensional YOLO prediction vector (i.e. x y w h o classx3)
slices confidence for all the three classes from the 8-dimensional YOLO prediction vector
multiples class confidence with objectness
sends these 2 tensors as input to the CoreML NMS

If I am showing the output of the tensors feeding into NMS ("slice_xywh_output" and "multiply_obj_conf_output"), these look like the following:

Shape of key slice_xywh_output
(1, 32, 4)
Data of key slice_xywh_output
[[[ 0.22485352  0.10229492 -0.58935547 -0.62695312]
  [-2.84179688  0.07049561 -0.58007812 -0.61083984]
  [ 0.86621094  0.26000977  0.08728027  0.33251953]
  [-1.35449219  0.15258789  0.09832764  0.38842773]
  [ 0.28613281  0.1574707  -1.01953125 -0.796875  ]
  [-2.64648438  0.09161377 -1.03417969 -0.8203125 ]
  [ 0.84667969  0.2902832  -0.5234375   0.03759766]
  [-1.35351562  0.17749023 -0.49121094  0.08111572]
  [ 0.          0.          0.          0.        ]
  [ 0.33544922  0.17712402 -0.26904297 -0.42626953]
  [-2.36328125  0.33276367 -0.33203125 -0.51220703]
  [ 0.44677734 -2.09960938 -0.29321289 -0.52880859]
  [-0.30175781  1.50976562  0.22375488  0.23059082]
  [-0.52929688 -0.98681641  0.23474121  0.16210938]
  [ 0.54492188  0.37475586  1.33984375  3.04882812]
  [ 0.36499023  0.27441406 -0.69091797 -0.75634766]
  [-2.375       0.26928711 -0.67822266 -0.79638672]
  [ 0.54101562 -2.0546875  -0.70166016 -0.83154297]
  [-0.33911133  1.51953125 -0.30151367 -0.1673584 ]
  [ 1.81835938 -0.95898438 -0.18823242 -0.1619873 ]
  [-0.39160156 -0.99169922 -0.25439453 -0.24133301]
  [ 0.60986328  0.13586426  0.34594727  1.25488281]
  [-0.40820312  1.57617188 -0.92041016  0.27319336]
  [-0.37402344 -1.00976562 -0.90527344  0.24584961]
  [ 0.          0.          0.          0.        ]
  [ 0.3996582  -0.82910156  1.22265625  1.10742188]
  [ 0.50878906  1.27929688  0.00901031 -0.51855469]
  [ 0.17687988  0.84765625  1.88476562  2.375     ]
  [ 0.22387695 -0.90722656  1.93457031  1.92871094]
  [ 0.1920166   1.33789062 -0.06939697  1.67480469]
  [ 0.34277344 -0.66992188 -0.02412415  1.31542969]
  [ 0.          0.          0.          0.        ]]]
Shape of key multiply_obj_conf_output
(1, 32, 3)
Data of key multiply_obj_conf_output
[[[ -6.69376373   6.77657318 -12.07177734]
  [ -6.70858765   6.36968231  -8.99402618]
  [  7.53936768  -7.6975708   -9.77398682]
  [  6.46255493  -6.45073318  -7.97967911]
  [ -1.18617153   1.18326187  -2.25595665]
  [ -1.45161152   1.38298988  -1.9520216 ]
  [  8.32269287  -8.4524231  -10.12394714]
  [  6.39775085  -6.47781372  -7.074646  ]
  [  0.           0.           0.        ]
  [ -5.05448151   5.08499146  -5.42060089]
  [ -3.5579443    3.58856964  -3.909235  ]
  [ -3.24484062   3.00595665  -3.02586365]
  [ -4.54496384   4.41510773  -7.90267181]
  [ -7.25372314   7.07107544 -11.00234985]
  [  4.61853027  -4.69714355  -6.2722168 ]
  [ -4.49057007   4.47177124  -4.6456604 ]
  [ -3.77206802   3.68643188  -4.17170334]
  [ -3.68796921   3.64676285  -3.74790573]
  [ -6.08796692   5.90591049 -10.37721634]
  [ -0.74313879   0.73117495  -1.14116669]
  [ -9.19396973   9.21398926 -15.23486328]
  [  6.42531967  -6.49116707  -8.14428329]
  [ -5.39413643   5.56509018  -9.39154434]
  [ -7.76382446   8.20882416 -12.68722534]
  [  0.           0.           0.        ]
  [ -4.36952209  -3.31585693   3.15795898]
  [ -2.0806694   -1.78788662   1.93880558]
  [ -6.69223022   6.86398315  -7.12774658]
  [ -8.8374939    7.79048157  -9.8401413 ]
  [-11.71128082  12.10197449 -15.44669342]
  [-13.67028809  14.2027359  -18.90808868]
  [  0.           0.           0.        ]]]

However, sometimes, feeding these two tensors in a NMS layer like the following

builder.add_nms(
    name="nms",
    input_names=["slice_xywh_output", "multiply_obj_conf_output"],
    output_names=["raw_coordinates", "raw_confidence", "raw_indices", "num_boxes"],
    iou_threshold=0.5,
    score_threshold=0.4,
    max_boxes=15,
    per_class_suppression=True
)

triggers segmentation fault and the result looks wrong:

Shape of key raw_confidence
(1, 15, 3)
Data of key raw_confidence
[[[-6.70858765  6.36968231 -8.99402618]
  [-6.70858765  6.36968231 -8.99402618]
  [-6.70858765  6.36968231 -8.99402618]
  [-6.70858765  6.36968231 -8.99402618]
  [-6.70858765  6.36968231 -8.99402618]
  [-6.70858765  6.36968231 -8.99402618]
  [-6.70858765  6.36968231 -8.99402618]
  [-6.70858765  6.36968231 -8.99402618]
  [-6.70858765  6.36968231 -8.99402618]
  [-6.70858765  6.36968231 -8.99402618]
  [-6.70858765  6.36968231 -8.99402618]
  [-6.70858765  6.36968231 -8.99402618]
  [-6.70858765  6.36968231 -8.99402618]
  [-6.70858765  6.36968231 -8.99402618]
  [ 0.          0.          0.        ]]]
Shape of key raw_coordinates
(1, 15, 4)
Data of key raw_coordinates
[[[-2.84179688  0.07049561 -0.58007812 -0.61083984]
  [-2.84179688  0.07049561 -0.58007812 -0.61083984]
  [-2.84179688  0.07049561 -0.58007812 -0.61083984]
  [-2.84179688  0.07049561 -0.58007812 -0.61083984]
  [-2.84179688  0.07049561 -0.58007812 -0.61083984]
  [-2.84179688  0.07049561 -0.58007812 -0.61083984]
  [-2.84179688  0.07049561 -0.58007812 -0.61083984]
  [-2.84179688  0.07049561 -0.58007812 -0.61083984]
  [-2.84179688  0.07049561 -0.58007812 -0.61083984]
  [-2.84179688  0.07049561 -0.58007812 -0.61083984]
  [-2.84179688  0.07049561 -0.58007812 -0.61083984]
  [-2.84179688  0.07049561 -0.58007812 -0.61083984]
  [-2.84179688  0.07049561 -0.58007812 -0.61083984]
  [-2.84179688  0.07049561 -0.58007812 -0.61083984]
  [ 0.          0.          0.          0.        ]]]
zsh: segmentation fault  python3 predict.py

This has been tested on a 2020 MacBook Pro (16 GB RAM, 2.0 Ghz processor).

Config

# Name                    Version                   Build  Channel
coremltools               4.0b1                    pypi_0    pypi

Any thoughts on this?

Thanks

The text was updated successfully, but these errors were encountered:

anilkatti · 2020-07-06T19:46:44Z

@dlawrences thanks for reporting this. We are looking into this.

bhushan23 · 2020-07-07T23:24:57Z

@dlawrences can you share new mlmodel with nms layer?

dlawrences · 2020-07-08T03:42:39Z

@dlawrences can you share new mlmodel with nms layer?

Sure. Could you please provide a private e-mail address so I can send it over? I wouldn't be able to share it publicly tho for now.

Cheers

moto-apple · 2020-07-09T17:27:15Z

Sure. Could you please provide a private e-mail address so I can send it over? I wouldn't be able to share it publicly tho for now.

Hello @dlawrences, could you file a radar and attach the model there? Thank you.
https://developer.apple.com/bug-reporting/

dlawrences · 2020-07-14T19:22:04Z

Hi

I'll try to file the radar in the coming days.

In terms of the code, this is what I am doing:

builder.add_nms(
    name="nms",
    input_names=["raw_coordinates", "raw_confidence"],
    output_names=["coordinates", "confidence", "indices", "num_boxes"],
    iou_threshold=0.5,
    score_threshold=0.0,
    max_boxes=10,
    per_class_suppression=True
)

With the settings above, I am actually getting the following error:

objc[12134]: Attempted to unregister unknown __weak variable at 0x7f83c98913d0. This is probably incorrect use of objc_storeWeak() and objc_loadWeak(). Break on objc_weak_error to debug.

python3(12134,0x10efc8dc0) malloc: *** error for object 0x7f83c98912b0: pointer being freed was not allocated
python3(12134,0x10efc8dc0) malloc: *** set a breakpoint in malloc_error_break to debug

The description of the spec is the following:

input {
  name: "raw_results1"
  type {
    multiArrayType {
      shape: 1
      shape: 3
      shape: 18
      shape: 18
      shape: 8
      dataType: FLOAT32
    }
  }
}
input {
  name: "raw_results2"
  type {
    multiArrayType {
      shape: 1
      shape: 3
      shape: 36
      shape: 36
      shape: 8
      dataType: FLOAT32
    }
  }
}
input {
  name: "raw_results3"
  type {
    multiArrayType {
      shape: 1
      shape: 3
      shape: 72
      shape: 72
      shape: 8
      dataType: FLOAT32
    }
  }
}
output {
  name: "coordinates"
  type {
    multiArrayType {
      dataType: DOUBLE
    }
  }
}
output {
  name: "confidence"
  type {
    multiArrayType {
      dataType: DOUBLE
    }
  }
}
output {
  name: "indices"
  type {
    multiArrayType {
      dataType: DOUBLE
    }
  }
}
output {
  name: "num_boxes"
  type {
    multiArrayType {
      dataType: DOUBLE
    }
  }
}

The issue is sometimes I actually get segmentation fault as in the original post.

It is really weird and inconsistent from a run to another.

I tried passing data as both np.float32 and np.double, no real differences.

Any thoughts?

Thanks

dlawrences · 2020-07-14T19:30:47Z

One thing to be specified that's missing from the documentation. I am pretty sure the layer NonMaximumSuppressionLayerParams expects input input1 in normalized xywh format... which is not really specified:

https://apple.github.io/coremltools/coremlspecification/sections/NeuralNetwork.html#nonmaximumsuppressionlayerparams

dlawrences · 2020-07-14T19:41:48Z

As described above, the NMS layer is the last step in the pipeline. Everything else is working as expected... I am providing for additional debugging purposes the actual data that enters the NMS layer.

The attached zip archive contains two files:

raw_coordinates.npy: binary file containing the data in tensor raw_coordinates that's served as input to the NMS layer
raw_confidence.npy: binary file containg the data in tensor raw_confidence that's served as input to the NMS layer

inputs.zip

moto-apple · 2020-07-14T21:26:30Z

Hi @dlawrences, thank you for the additional information. Unfortunately I have not been able to recreate the issue. We would like to have a radar with these attachments:

sysdiagnose (*) taken after the problem occurs.
Reproducible script or code
The model file.

(*) You can take sysdiagnose with the following command on the terminal. It creates a file something like sysdiagnose_2020.07.14_14-17-36-0700_macOS_MacBookPro16-1_xxxxxx.tar.gz. Please attach it to the radar.

$ sudo sysdiagnose

pocketpixels · 2021-07-03T00:31:07Z

@dlawrences I ran into this as well. The issue in my case (and very likely in your case) was that Yolo does not use a softmax but predicts probabilities for all the classes independently. This means that the probabilities for multiple classes can sum up to more than 1. CoreML's NMS network expects the probabilities to sum to at most 1. When the sum exceeds 1 you get a crash (in my case when running on the Neural Engine on iOS).

PS edit: In what I wrote above I was maybe a bit overly confident that this is the same issue you are seeing. It very well might not be. But try limiting the predicted confidence values to sum to <= 1 before feeding them into the NMS layer and see if it fixes the crashes for you.

dlawrences · 2021-07-08T06:17:09Z

Thanks @pocketpixels. I moved away from this and ended up using a pipeline type of model, where the CNN is just a model in the pipeline feeding off results to another "model" that is based on the NMS implementation from CoreML. That seems to work as expected.

For reference, here's an example: https://github.com/hollance/coreml-survival-guide/blob/4dfcbb97c065726a3da240c55d90b2075959801d/MobileNetV2%2BSSDLite/ssdlite.py#L333

pocketpixels · 2021-07-08T18:59:36Z

@dlawrences Actually, that's how I am using the CoreML NMS also, as a model in a pipeline. (Sorry, I clearly didn't read your original description closely enough).
For me, it only crashed in the NMS (with a completely non-descriptive error) when it was running on the Neural Engine (and when the probabilities summed to > 1). When running on GPU/CPU it ran fine, and would just return a bounding box confidence > 1.
I would still recommend changing your model to normalize the class probabilities to keep their sum from exceeding 1 to avoid surprise mystery crashes in the future.

gomer-noah · 2021-08-07T04:27:35Z

@dlawrences Any chance you'd still be willing to share the mlmodel you put together to decode YOLOv5 output? I'm trying to do the same using the builder but running into an "Error computing NN outputs" I can't get past. I'm sure it's something I'm doing wrong but I haven't been able to figure it out.

pocketpixels · 2021-08-08T00:16:27Z

@gomer-noah You can try my CoreML export script
You might want to change nms.pickTop.perClass to True, depending on your application.

gomer-noah · 2021-08-08T01:09:56Z

@gomer-noah You can try my CoreML export script
You might want to change nms.pickTop.perClass to True, depending on your application.

I'll take a look. Thank you so much!

TobyRoseman · 2021-10-13T22:56:34Z

@dlawrences - did you ever submit the requested information to https://developer.apple.com/bug-reporting/ ? If so, what was the id value you received?

dlawrences · 2021-10-14T05:18:39Z

@TobyRoseman nope, I never went ahead with the radar report to Apple.

dlawrences added the bug Unexpected behaviour that should be corrected (type) label Jul 6, 2020

1duo added PyTorch (traced) builder labels Jul 6, 2020

TobyRoseman added the awaiting response Please respond to this issue to provide further clarification (status) label Oct 13, 2021

pocketpixels mentioned this issue Oct 18, 2021

Use export.py to generate yolov5s.onnx will get a negative number. ultralytics/yolov5#343

Closed

TobyRoseman closed this as completed Mar 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault during CoreML NMS #752

Segmentation fault during CoreML NMS #752

dlawrences commented Jul 6, 2020

anilkatti commented Jul 6, 2020

bhushan23 commented Jul 7, 2020

dlawrences commented Jul 8, 2020

moto-apple commented Jul 9, 2020

dlawrences commented Jul 14, 2020

dlawrences commented Jul 14, 2020 •

edited

Loading

dlawrences commented Jul 14, 2020

moto-apple commented Jul 14, 2020

pocketpixels commented Jul 3, 2021 •

edited

Loading

dlawrences commented Jul 8, 2021 •

edited

Loading

pocketpixels commented Jul 8, 2021

gomer-noah commented Aug 7, 2021

pocketpixels commented Aug 8, 2021

gomer-noah commented Aug 8, 2021

TobyRoseman commented Oct 13, 2021

dlawrences commented Oct 14, 2021

Segmentation fault during CoreML NMS #752

Segmentation fault during CoreML NMS #752

Comments

dlawrences commented Jul 6, 2020

Config

anilkatti commented Jul 6, 2020

bhushan23 commented Jul 7, 2020

dlawrences commented Jul 8, 2020

moto-apple commented Jul 9, 2020

dlawrences commented Jul 14, 2020

dlawrences commented Jul 14, 2020 • edited Loading

dlawrences commented Jul 14, 2020

moto-apple commented Jul 14, 2020

pocketpixels commented Jul 3, 2021 • edited Loading

dlawrences commented Jul 8, 2021 • edited Loading

pocketpixels commented Jul 8, 2021

gomer-noah commented Aug 7, 2021

pocketpixels commented Aug 8, 2021

gomer-noah commented Aug 8, 2021

TobyRoseman commented Oct 13, 2021

dlawrences commented Oct 14, 2021

dlawrences commented Jul 14, 2020 •

edited

Loading

pocketpixels commented Jul 3, 2021 •

edited

Loading

dlawrences commented Jul 8, 2021 •

edited

Loading