Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polygraphy: How to write the data_loader.py to send the calibration data? #4196

Open
Kongsea opened this issue Oct 12, 2024 · 9 comments
Open
Labels
Tools: Polygraphy triaged Issue has been triaged by maintainers

Comments

@Kongsea
Copy link

Kongsea commented Oct 12, 2024

The example data_loader.py file used the fake data.
I want to know how to write the file to send image files data to Polygraphy to calibrate the model and improve the accuracy.

Such as the axis, the data range, and so on.
The axis is image_num, image_channel, height, width or the other?
The data range is [0, 1] or [0, 255]? It should be the same as the pth model input or be stricted to a fixed range?

Thank you for any suggestions or help.

@Kongsea
Copy link
Author

Kongsea commented Oct 12, 2024

Use trtexec --onnx=model.onnx --saveEngine=model.trt--int8 without calibration data to quantize the model can get a trt model to inference and get a low precision image.

However, use polygraphy convert model.onnx --int8 -o model.trt without calibration data to quantize the model can get a trt model whose output is abnormal with very small numbers.

Then I write a data_loader.py to use polygraphy to quantize the onnx model with calibration data, the output is very similar with no calibration data. I was very confused.

def load_data():
    for i, image in enumerate(images):
        img = cv2.imread(image, 0)
        if len(img.shape) == 2:
            img = np.expand_dims(img, axis=2)
        img = (np.transpose(np.ascontiguousarray(np.expand_dims(img, axis=0)), (0, 3, 1, 2))).astype(np.float16)
        yield {
            "input": img
        }

@yuanyao-nv
Copy link
Collaborator

I think the trtexec and polygraphy commands should be doing the same thing. Not sure why they are giving different results.
cc: @pranavm-nvidia

@yuanyao-nv yuanyao-nv added triaged Issue has been triaged by maintainers Tools: Polygraphy labels Oct 16, 2024
@pranavm-nvidia
Copy link
Collaborator

trtexec will initialize the dynamic ranges to fixed values while polygraphy will calibrate on the input data (if none is provided, then it would be synthetic data).
How many images are you using for calibration?

@Kongsea
Copy link
Author

Kongsea commented Oct 18, 2024

This is the output using --fp16 of trtexec to quantize without the calibration:
Image

The following is using --int8 of trtexec without calibration:
Image

The following is using --best of trtexec without calibration:
Image

The following is using --int8 of trtexec with int8 calibration data:
Image

So I want to know if it's cause by an incorrect calibration data generation method.

When using polygraphy, an error is raised now:

[E] 1: [calibrator.cpp::add::798] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [executionContext.cpp::commonEmitDebugTensor::1517] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [resizingAllocator.cpp::deallocate::104] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [resizingAllocator.cpp::deallocate::104] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [resizingAllocator.cpp::deallocate::104] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [graphContext.h::~MyelinGraphContext::72] Error Code 1: Myelin ([impl.cpp:cuda_object_deallocate:474] Error 700 destroying stream '0x7a972910'.)
[E] 1: [graphContext.h::~MyelinGraphContext::72] Error Code 1: Myelin ([impl.cpp:cuda_object_deallocate:474] Error 700 destroying stream '0x7a97a4d0'.)
........................
[E] 1: [graphContext.h::~MyelinGraphContext::72] Error Code 1: Myelin ([impl.cpp:cuda_object_deallocate:474] Error 700 destroying stream '0x7e2c0d90'.)
[E] 1: [graphContext.h::~MyelinGraphContext::72] Error Code 1: Myelin ([impl.cpp:cuda_object_deallocate:474] Error 700 destroying stream '0x7a89aa90'.)
[E] 1: [resizingAllocator.cpp::deallocate::104] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [resizingAllocator.cpp::deallocate::104] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [scopedCudaResources.cpp::~ScopedCudaStream::43] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [scopedCudaResources.cpp::~ScopedCudaEvent::20] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
.......
[E] 1: [scopedCudaResources.cpp::~ScopedCudaEvent::20] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [scopedCudaResources.cpp::~ScopedCudaEvent::20] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 1: [cudaDriverHelpers.cpp::operator()::96] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[E] 1: [cudaDriverHelpers.cpp::operator()::96] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[E] 1: [cudaDriverHelpers.cpp::operator()::96] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[E] 1: [cudaDriverHelpers.cpp::operator()::96] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[E] 1: [cudaDriverHelpers.cpp::operator()::96] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[E] 1: [scopedCudaResources.cpp::~ScopedCudaStream::43] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 2: [calibrator.cpp::calibrateEngine::1222] Error Code 2: Internal Error (Assertion context->executeV2(bindings.data()) failed. )
[!] Invalid Engine. Please ensure the engine was built correctly

However, it works well before and I don't modify anything.

@Kongsea
Copy link
Author

Kongsea commented Oct 18, 2024

trtexec will initialize the dynamic ranges to fixed values while polygraphy will calibrate on the input data (if none is provided, then it would be synthetic data). How many images are you using for calibration?

I have tried to use 500/1000 and more than 3000 images to calibrate the model, However, the result is almost the same.

@pranavm-nvidia
Copy link
Collaborator

Calibration is performed on FP32 models generally. Can you try feeding in FP32 inputs instead? Also make sure that you apply the same preprocessing as you do for inference.

@Kongsea
Copy link
Author

Kongsea commented Oct 19, 2024

Calibration is performed on FP32 models generally. Can you try feeding in FP32 inputs instead? Also make sure that you apply the same preprocessing as you do for inference.

I used fp16 when training the network. So do I need use fp32 to calibrate the model when I quantize it?
Thank you.

@pranavm-nvidia
Copy link
Collaborator

I believe so. We disable FP16 mode when calibrating.

The other option is to use quantization-aware training so that the model already has quantization information baked in, or use ModelOpt to do post-training quantization.

@Kongsea
Copy link
Author

Kongsea commented Oct 22, 2024

I believe so. We disable FP16 mode when calibrating.

The other option is to use quantization-aware training so that the model already has quantization information baked in, or use ModelOpt to do post-training quantization.

OK. Thank you. I will have a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Tools: Polygraphy triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants