Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploy DeBERTa to Triton Inference Server #4202

Open
nbroad1881 opened this issue Oct 16, 2024 · 1 comment
Open

Deploy DeBERTa to Triton Inference Server #4202

nbroad1881 opened this issue Oct 16, 2024 · 1 comment
Labels
triaged Issue has been triaged by maintainers

Comments

@nbroad1881
Copy link

nbroad1881 commented Oct 16, 2024

I followed the steps in the DeBERTa guide to create the modified onnx file with the plugin. When I try using this model with triton inference server, it says

Internal: onnx runtime error 9: Could not find an implementation for DisentangledAttention_TRT(1) node with name 'onnx_graphsurgeon_node_0'

Is there a way to get this to work in triton? I'm using triton 24.09.

I can confirm that I was able to get the onnx model working fine when using the onnxruntime package in a python script. It works fine in triton if I don't use the plugin

Slightly separate issue that might be better for another issue:
The model in fp16 is garbage, even with the layernorm in fp32.

@yuanyao-nv yuanyao-nv added the triaged Issue has been triaged by maintainers label Oct 18, 2024
@yuanyao-nv
Copy link
Collaborator

The onnxruntime error is indicating that you're using a TRT-specific plugin node in your onnx model which onxnruntime doesn't recognize.
Please provide more details on the accuracy issues you're seeing, such as the specific configs you used to run the model. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

2 participants