You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I followed the steps in the DeBERTa guide to create the modified onnx file with the plugin. When I try using this model with triton inference server, it says
Internal: onnx runtime error 9: Could not find an implementation for DisentangledAttention_TRT(1) node with name 'onnx_graphsurgeon_node_0'
Is there a way to get this to work in triton? I'm using triton 24.09.
I can confirm that I was able to get the onnx model working fine when using the onnxruntime package in a python script. It works fine in triton if I don't use the plugin
Slightly separate issue that might be better for another issue:
The model in fp16 is garbage, even with the layernorm in fp32.
The text was updated successfully, but these errors were encountered:
The onnxruntime error is indicating that you're using a TRT-specific plugin node in your onnx model which onxnruntime doesn't recognize.
Please provide more details on the accuracy issues you're seeing, such as the specific configs you used to run the model. Thanks!
I followed the steps in the DeBERTa guide to create the modified onnx file with the plugin. When I try using this model with triton inference server, it says
Is there a way to get this to work in triton? I'm using triton 24.09.
I can confirm that I was able to get the onnx model working fine when using the onnxruntime package in a python script. It works fine in triton if I don't use the plugin
Slightly separate issue that might be better for another issue:
The model in fp16 is garbage, even with the layernorm in fp32.
The text was updated successfully, but these errors were encountered: