Polygraphy: How to write the data_loader.py to send the calibration data? #4196

Kongsea · 2024-10-12T09:26:30Z

The example data_loader.py file used the fake data.
I want to know how to write the file to send image files data to Polygraphy to calibrate the model and improve the accuracy.

Such as the axis, the data range, and so on.
The axis is image_num, image_channel, height, width or the other?
The data range is [0, 1] or [0, 255]? It should be the same as the pth model input or be stricted to a fixed range?

Thank you for any suggestions or help.

The text was updated successfully, but these errors were encountered:

Kongsea · 2024-10-12T11:30:47Z

Use trtexec --onnx=model.onnx --saveEngine=model.trt--int8 without calibration data to quantize the model can get a trt model to inference and get a low precision image.

However, use polygraphy convert model.onnx --int8 -o model.trt without calibration data to quantize the model can get a trt model whose output is abnormal with very small numbers.

Then I write a data_loader.py to use polygraphy to quantize the onnx model with calibration data, the output is very similar with no calibration data. I was very confused.

def load_data():
    for i, image in enumerate(images):
        img = cv2.imread(image, 0)
        if len(img.shape) == 2:
            img = np.expand_dims(img, axis=2)
        img = (np.transpose(np.ascontiguousarray(np.expand_dims(img, axis=0)), (0, 3, 1, 2))).astype(np.float16)
        yield {
            "input": img
        }

yuanyao-nv · 2024-10-16T20:29:32Z

I think the trtexec and polygraphy commands should be doing the same thing. Not sure why they are giving different results.
cc: @pranavm-nvidia

pranavm-nvidia · 2024-10-16T20:32:22Z

trtexec will initialize the dynamic ranges to fixed values while polygraphy will calibrate on the input data (if none is provided, then it would be synthetic data).
How many images are you using for calibration?

Kongsea · 2024-10-18T03:10:19Z

This is the output using --fp16 of trtexec to quantize without the calibration:

The following is using --int8 of trtexec without calibration:

The following is using --best of trtexec without calibration:

The following is using --int8 of trtexec with int8 calibration data:

So I want to know if it's cause by an incorrect calibration data generation method.

When using polygraphy, an error is raised now:

[E] 1: [calibrator.cpp::add::798] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [executionContext.cpp::commonEmitDebugTensor::1517] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [resizingAllocator.cpp::deallocate::104] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [resizingAllocator.cpp::deallocate::104] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [resizingAllocator.cpp::deallocate::104] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [graphContext.h::~MyelinGraphContext::72] Error Code 1: Myelin ([impl.cpp:cuda_object_deallocate:474] Error 700 destroying stream '0x7a972910'.)
[E] 1: [graphContext.h::~MyelinGraphContext::72] Error Code 1: Myelin ([impl.cpp:cuda_object_deallocate:474] Error 700 destroying stream '0x7a97a4d0'.)
........................
[E] 1: [graphContext.h::~MyelinGraphContext::72] Error Code 1: Myelin ([impl.cpp:cuda_object_deallocate:474] Error 700 destroying stream '0x7e2c0d90'.)
[E] 1: [graphContext.h::~MyelinGraphContext::72] Error Code 1: Myelin ([impl.cpp:cuda_object_deallocate:474] Error 700 destroying stream '0x7a89aa90'.)
[E] 1: [resizingAllocator.cpp::deallocate::104] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [resizingAllocator.cpp::deallocate::104] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [scopedCudaResources.cpp::~ScopedCudaStream::43] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [scopedCudaResources.cpp::~ScopedCudaEvent::20] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
.......
[E] 1: [scopedCudaResources.cpp::~ScopedCudaEvent::20] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [scopedCudaResources.cpp::~ScopedCudaEvent::20] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [cudaDriverHelpers.cpp::operator()::96] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[E] 1: [cudaDriverHelpers.cpp::operator()::96] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[E] 1: [cudaDriverHelpers.cpp::operator()::96] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[E] 1: [cudaDriverHelpers.cpp::operator()::96] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[E] 1: [cudaDriverHelpers.cpp::operator()::96] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[E] 1: [scopedCudaResources.cpp::~ScopedCudaStream::43] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 2: [calibrator.cpp::calibrateEngine::1222] Error Code 2: Internal Error (Assertion context->executeV2(bindings.data()) failed. )
[!] Invalid Engine. Please ensure the engine was built correctly

However, it works well before and I don't modify anything.

Kongsea · 2024-10-18T03:11:29Z

trtexec will initialize the dynamic ranges to fixed values while polygraphy will calibrate on the input data (if none is provided, then it would be synthetic data). How many images are you using for calibration?

I have tried to use 500/1000 and more than 3000 images to calibrate the model, However, the result is almost the same.

pranavm-nvidia · 2024-10-18T17:09:25Z

Calibration is performed on FP32 models generally. Can you try feeding in FP32 inputs instead? Also make sure that you apply the same preprocessing as you do for inference.

Kongsea · 2024-10-19T06:49:24Z

Calibration is performed on FP32 models generally. Can you try feeding in FP32 inputs instead? Also make sure that you apply the same preprocessing as you do for inference.

I used fp16 when training the network. So do I need use fp32 to calibrate the model when I quantize it?
Thank you.

pranavm-nvidia · 2024-10-21T17:13:17Z

I believe so. We disable FP16 mode when calibrating.

The other option is to use quantization-aware training so that the model already has quantization information baked in, or use ModelOpt to do post-training quantization.

Kongsea · 2024-10-22T02:52:57Z

I believe so. We disable FP16 mode when calibrating.

The other option is to use quantization-aware training so that the model already has quantization information baked in, or use ModelOpt to do post-training quantization.

OK. Thank you. I will have a try.

yuanyao-nv added triaged Issue has been triaged by maintainers Tools: Polygraphy labels Oct 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Polygraphy: How to write the data_loader.py to send the calibration data? #4196

Polygraphy: How to write the data_loader.py to send the calibration data? #4196

Kongsea commented Oct 12, 2024

Kongsea commented Oct 12, 2024

yuanyao-nv commented Oct 16, 2024

pranavm-nvidia commented Oct 16, 2024

Kongsea commented Oct 18, 2024 •

edited

Loading

Kongsea commented Oct 18, 2024

pranavm-nvidia commented Oct 18, 2024

Kongsea commented Oct 19, 2024

pranavm-nvidia commented Oct 21, 2024

Kongsea commented Oct 22, 2024

Polygraphy: How to write the data_loader.py to send the calibration data? #4196

Polygraphy: How to write the data_loader.py to send the calibration data? #4196

Comments

Kongsea commented Oct 12, 2024

Kongsea commented Oct 12, 2024

yuanyao-nv commented Oct 16, 2024

pranavm-nvidia commented Oct 16, 2024

Kongsea commented Oct 18, 2024 • edited Loading

Kongsea commented Oct 18, 2024

pranavm-nvidia commented Oct 18, 2024

Kongsea commented Oct 19, 2024

pranavm-nvidia commented Oct 21, 2024

Kongsea commented Oct 22, 2024

Kongsea commented Oct 18, 2024 •

edited

Loading