Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorRT Segmentation fault #277

Open
matuszelenak opened this issue Sep 17, 2024 · 7 comments
Open

TensorRT Segmentation fault #277

matuszelenak opened this issue Sep 17, 2024 · 7 comments

Comments

@matuszelenak
Copy link

I'm trying to run the TensorRT version of the docker container according to instructions, but am getting a segfault whenever I attempt to transcribe any audio. The same audio works with the Faster whisper backend.
This happens for both live transcription and submission of file

System info: Debian 12 VM with a RTX 3090 passthrough to it. Driver version 545.23.06

Full log:

(base) whiskas@debian-gpu:/mnt/samsung/projects/WhisperLive$ docker run -p 9090:9090 --runtime=nvidia --gpus all --entrypoint /bin/bash -it ghcr.io/collabora/whisperlive-tensorrt
root@c3f2d94d2f68:/app# bash build_whisper_tensorrt.sh /app/TensorRT-LLM-examples small.en
Looking in indexes: https://pypi.org/simple, https://pypi.nvidia.com
Requirement already satisfied: tensorrt_llm==0.9.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 2)) (0.9.0)
Requirement already satisfied: tiktoken in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 3)) (0.7.0)
Requirement already satisfied: datasets in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 4)) (3.0.0)
Requirement already satisfied: kaldialign in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 5)) (0.9.1)
Requirement already satisfied: openai-whisper in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 6)) (20231117)
Collecting librosa
  Downloading librosa-0.10.2.post1-py3-none-any.whl (260 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 260.1/260.1 KB 1.9 MB/s eta 0:00:00
Requirement already satisfied: soundfile in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 8)) (0.12.1)
Requirement already satisfied: safetensors in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 9)) (0.4.5)
Requirement already satisfied: transformers in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 10)) (4.38.2)
Requirement already satisfied: janus in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 11)) (1.0.0)
Installing collected packages: librosa
Successfully installed librosa-0.10.2.post1
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Downloading small.en...
--2024-09-17 07:04:29--  https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt
Resolving openaipublic.azureedge.net (openaipublic.azureedge.net)... 13.107.246.67, 2620:1ec:bdf::67
Connecting to openaipublic.azureedge.net (openaipublic.azureedge.net)|13.107.246.67|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 483615683 (461M) [application/octet-stream]
Saving to: 'assets/small.en.pt'

small.en.pt                                     100%[=====================================================================================================>] 461.21M  3.22MB/s    in 2m 53s  

2024-09-17 07:07:24 (2.66 MB/s) - 'assets/small.en.pt' saved [483615683/483615683]

Download completed: small.en.pt
whisper_small_en
Running build script for small.en with output directory whisper_small_en
[TensorRT-LLM] TensorRT-LLM version: 0.9.0
[09/17/2024-07:07:26] [TRT-LLM] [I] plugin_arg is None, setting it as float16 automatically.
[09/17/2024-07:07:26] [TRT-LLM] [I] plugin_arg is None, setting it as float16 automatically.
[09/17/2024-07:07:26] [TRT-LLM] [I] plugin_arg is None, setting it as float16 automatically.
[09/17/2024-07:07:26] [TRT] [I] [MemUsageChange] Init CUDA: CPU +13, GPU +0, now: CPU 598, GPU 17025 (MiB)
[09/17/2024-07:07:28] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1799, GPU +312, now: CPU 2533, GPU 17337 (MiB)
[09/17/2024-07:07:28] [TRT-LLM] [W] Parameter dtype is None, using default dtype: DataType.FLOAT, it is recommended to always specify dtype explicitly
[09/17/2024-07:07:28] [TRT-LLM] [W] Parameter dtype is None, using default dtype: DataType.FLOAT, it is recommended to always specify dtype explicitly
[09/17/2024-07:07:28] [TRT-LLM] [W] Parameter dtype is None, using default dtype: DataType.FLOAT, it is recommended to always specify dtype explicitly
[09/17/2024-07:07:28] [TRT-LLM] [W] Parameter dtype is None, using default dtype: DataType.FLOAT, it is recommended to always specify dtype explicitly
[09/17/2024-07:07:28] [TRT-LLM] [W] Parameter dtype is None, using default dtype: DataType.FLOAT, it is recommended to always specify dtype explicitly
[09/17/2024-07:07:28] [TRT-LLM] [W] Parameter dtype is None, using default dtype: DataType.FLOAT, it is recommended to always specify dtype explicitly
[09/17/2024-07:07:28] [TRT-LLM] [I] Loading encoder weights from PT...
[09/17/2024-07:07:28] [TRT-LLM] [W] Parameter was initialized as DataType.HALF but set to DataType.FLOAT
[09/17/2024-07:07:28] [TRT-LLM] [W] Parameter was initialized as DataType.FLOAT but set to DataType.HALF
[09/17/2024-07:07:28] [TRT-LLM] [W] Parameter was initialized as DataType.FLOAT but set to DataType.HALF
[09/17/2024-07:07:28] [TRT-LLM] [W] Parameter was initialized as DataType.FLOAT but set to DataType.HALF
[09/17/2024-07:07:28] [TRT-LLM] [W] Parameter was initialized as DataType.FLOAT but set to DataType.HALF
[09/17/2024-07:07:28] [TRT-LLM] [W] Parameter was initialized as DataType.FLOAT but set to DataType.HALF
[09/17/2024-07:07:28] [TRT-LLM] [W] Parameter was initialized as DataType.FLOAT but set to DataType.HALF
[09/17/2024-07:07:28] [TRT-LLM] [I] Set bert_attention_plugin to None.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set gpt_attention_plugin to None.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set gemm_plugin to None.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set smooth_quant_gemm_plugin to None.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set identity_plugin to None.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set layernorm_quantization_plugin to None.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set rmsnorm_quantization_plugin to None.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set nccl_plugin to None.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set lookup_plugin to None.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set lora_plugin to None.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set weight_only_groupwise_quant_matmul_plugin to None.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set weight_only_quant_matmul_plugin to None.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set quantize_per_token_plugin to False.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set quantize_tensor_plugin to False.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set moe_plugin to None.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set mamba_conv1d_plugin to None.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set context_fmha to False.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set context_fmha_fp32_acc to False.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set paged_kv_cache to False.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set remove_input_padding to False.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set use_custom_all_reduce to False.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set multi_block_mode to False.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set enable_xqa to False.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set attention_qk_half_accumulation to False.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set use_paged_context_fmha to False.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set use_fp8_context_fmha to False.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set use_context_fmha_for_generation to False.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set multiple_profiles to False.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set paged_state to False.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set streamingllm to False.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set gemm_plugin to float16.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set bert_attention_plugin to float16.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set context_fmha to True.
[09/17/2024-07:07:28] [TRT-LLM] [I] Set context_fmha_fp32_acc to False.
[09/17/2024-07:07:28] [TRT] [W] IElementWiseLayer with inputs WhisperEncoder/SHUFFLE_0_output_0 and WhisperEncoder/conv1/SHUFFLE_1_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:07:28] [TRT] [W] IElementWiseLayer with inputs WhisperEncoder/conv1/SHUFFLE_1_output_0 and WhisperEncoder/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:07:28] [TRT] [W] IElementWiseLayer with inputs WhisperEncoder/SHUFFLE_2_output_0 and WhisperEncoder/ELEMENTWISE_POW_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:07:28] [TRT] [W] IElementWiseLayer with inputs WhisperEncoder/conv1/SHUFFLE_1_output_0 and WhisperEncoder/ELEMENTWISE_PROD_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:07:28] [TRT] [W] IElementWiseLayer with inputs WhisperEncoder/SHUFFLE_3_output_0 and WhisperEncoder/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:07:28] [TRT-LLM] [I] Build TensorRT engine Unnamed Network 0
[09/17/2024-07:07:28] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2736, GPU 17363 (MiB)
[09/17/2024-07:07:28] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 2738, GPU 17373 (MiB)
[09/17/2024-07:07:28] [TRT] [W] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.2
[09/17/2024-07:07:28] [TRT] [I] Global timing cache in use. Profiling results in this builder pass will be stored.
[09/17/2024-07:08:06] [TRT] [I] Detected 2 inputs and 1 output network tensors.
[09/17/2024-07:08:06] [TRT] [I] Total Host Persistent Memory: 32384
[09/17/2024-07:08:06] [TRT] [I] Total Device Persistent Memory: 0
[09/17/2024-07:08:06] [TRT] [I] Total Scratch Memory: 33602560
[09/17/2024-07:08:06] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 258 steps to complete.
[09/17/2024-07:08:06] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 5.03566ms to assign 6 blocks to 258 nodes requiring 217874944 bytes.
[09/17/2024-07:08:06] [TRT] [I] Total Activation Memory: 217874432
[09/17/2024-07:08:06] [TRT] [I] Total Weights Memory: 176329728
[09/17/2024-07:08:06] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2966, GPU 17569 (MiB)
[09/17/2024-07:08:06] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 2966, GPU 17579 (MiB)
[09/17/2024-07:08:06] [TRT] [W] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.2
[09/17/2024-07:08:06] [TRT] [I] Engine generation completed in 38.3389 seconds.
[09/17/2024-07:08:06] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 5 MiB, GPU 1126 MiB
[09/17/2024-07:08:06] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +169, now: CPU 0, GPU 169 (MiB)
[09/17/2024-07:08:06] [TRT] [I] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 4912 MiB
[09/17/2024-07:08:06] [TRT-LLM] [I] Total time of building Unnamed Network 0: 00:00:38
[09/17/2024-07:08:06] [TRT-LLM] [I] Config saved to whisper_small_en/encoder_config.json.
[09/17/2024-07:08:06] [TRT-LLM] [I] Serializing engine to whisper_small_en/whisper_encoder_float16_tp1_rank0.engine...
[09/17/2024-07:08:07] [TRT-LLM] [I] Engine serialized. Total time: 00:00:00
[09/17/2024-07:08:07] [TRT] [I] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 2961, GPU 17373 (MiB)
[09/17/2024-07:08:07] [TRT-LLM] [W] Invalid timing cache, using freshly created one
[09/17/2024-07:08:07] [TRT-LLM] [I] Loading decoder weights from PT...
[09/17/2024-07:08:07] [TRT-LLM] [I] Set bert_attention_plugin to None.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set gpt_attention_plugin to None.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set gemm_plugin to None.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set smooth_quant_gemm_plugin to None.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set identity_plugin to None.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set layernorm_quantization_plugin to None.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set rmsnorm_quantization_plugin to None.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set nccl_plugin to None.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set lookup_plugin to None.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set lora_plugin to None.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set weight_only_groupwise_quant_matmul_plugin to None.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set weight_only_quant_matmul_plugin to None.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set quantize_per_token_plugin to False.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set quantize_tensor_plugin to False.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set moe_plugin to None.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set mamba_conv1d_plugin to None.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set context_fmha to False.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set context_fmha_fp32_acc to False.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set paged_kv_cache to False.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set remove_input_padding to False.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set use_custom_all_reduce to False.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set multi_block_mode to False.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set enable_xqa to False.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set attention_qk_half_accumulation to False.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set use_paged_context_fmha to False.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set use_fp8_context_fmha to False.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set use_context_fmha_for_generation to False.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set multiple_profiles to False.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set paged_state to False.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set streamingllm to False.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set gemm_plugin to float16.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set gpt_attention_plugin to float16.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set context_fmha to True.
[09/17/2024-07:08:07] [TRT-LLM] [I] Set context_fmha_fp32_acc to False.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/embedding/vocab_embedding/GATHER_0_output_0 and DecoderModel/embedding/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/embedding/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/0/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/0/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/0/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[TensorRT-LLM][WARNING] Fall back to unfused MHA because of cross attention.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/0/ELEMENTWISE_SUM_1_output_0 and DecoderModel/decoder_layers/0/SHUFFLE_2_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/0/mlp/SHUFFLE_0_output_0 and DecoderModel/decoder_layers/0/mlp/fc/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/0/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/0/mlp/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/0/mlp/SHUFFLE_2_output_0 and DecoderModel/decoder_layers/0/mlp/ELEMENTWISE_POW_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/0/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/0/mlp/ELEMENTWISE_PROD_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/0/mlp/SHUFFLE_3_output_0 and DecoderModel/decoder_layers/0/mlp/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/0/ELEMENTWISE_PROD_2_output_0 and DecoderModel/decoder_layers/0/mlp/proj/ELEMENTWISE_SUM_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/0/ELEMENTWISE_SUM_2_output_0 and DecoderModel/decoder_layers/1/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/1/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/1/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[TensorRT-LLM][WARNING] Fall back to unfused MHA because of cross attention.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/1/ELEMENTWISE_SUM_1_output_0 and DecoderModel/decoder_layers/1/SHUFFLE_2_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/1/mlp/SHUFFLE_0_output_0 and DecoderModel/decoder_layers/1/mlp/fc/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/1/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/1/mlp/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/1/mlp/SHUFFLE_2_output_0 and DecoderModel/decoder_layers/1/mlp/ELEMENTWISE_POW_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/1/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/1/mlp/ELEMENTWISE_PROD_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/1/mlp/SHUFFLE_3_output_0 and DecoderModel/decoder_layers/1/mlp/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/1/ELEMENTWISE_PROD_2_output_0 and DecoderModel/decoder_layers/1/mlp/proj/ELEMENTWISE_SUM_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/1/ELEMENTWISE_SUM_2_output_0 and DecoderModel/decoder_layers/2/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/2/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/2/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[TensorRT-LLM][WARNING] Fall back to unfused MHA because of cross attention.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/2/ELEMENTWISE_SUM_1_output_0 and DecoderModel/decoder_layers/2/SHUFFLE_2_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/2/mlp/SHUFFLE_0_output_0 and DecoderModel/decoder_layers/2/mlp/fc/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/2/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/2/mlp/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/2/mlp/SHUFFLE_2_output_0 and DecoderModel/decoder_layers/2/mlp/ELEMENTWISE_POW_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/2/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/2/mlp/ELEMENTWISE_PROD_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/2/mlp/SHUFFLE_3_output_0 and DecoderModel/decoder_layers/2/mlp/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/2/ELEMENTWISE_PROD_2_output_0 and DecoderModel/decoder_layers/2/mlp/proj/ELEMENTWISE_SUM_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/2/ELEMENTWISE_SUM_2_output_0 and DecoderModel/decoder_layers/3/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/3/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/3/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[TensorRT-LLM][WARNING] Fall back to unfused MHA because of cross attention.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/3/ELEMENTWISE_SUM_1_output_0 and DecoderModel/decoder_layers/3/SHUFFLE_2_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/3/mlp/SHUFFLE_0_output_0 and DecoderModel/decoder_layers/3/mlp/fc/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/3/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/3/mlp/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/3/mlp/SHUFFLE_2_output_0 and DecoderModel/decoder_layers/3/mlp/ELEMENTWISE_POW_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/3/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/3/mlp/ELEMENTWISE_PROD_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/3/mlp/SHUFFLE_3_output_0 and DecoderModel/decoder_layers/3/mlp/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/3/ELEMENTWISE_PROD_2_output_0 and DecoderModel/decoder_layers/3/mlp/proj/ELEMENTWISE_SUM_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/3/ELEMENTWISE_SUM_2_output_0 and DecoderModel/decoder_layers/4/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/4/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/4/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[TensorRT-LLM][WARNING] Fall back to unfused MHA because of cross attention.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/4/ELEMENTWISE_SUM_1_output_0 and DecoderModel/decoder_layers/4/SHUFFLE_2_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/4/mlp/SHUFFLE_0_output_0 and DecoderModel/decoder_layers/4/mlp/fc/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/4/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/4/mlp/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/4/mlp/SHUFFLE_2_output_0 and DecoderModel/decoder_layers/4/mlp/ELEMENTWISE_POW_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/4/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/4/mlp/ELEMENTWISE_PROD_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/4/mlp/SHUFFLE_3_output_0 and DecoderModel/decoder_layers/4/mlp/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/4/ELEMENTWISE_PROD_2_output_0 and DecoderModel/decoder_layers/4/mlp/proj/ELEMENTWISE_SUM_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/4/ELEMENTWISE_SUM_2_output_0 and DecoderModel/decoder_layers/5/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/5/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/5/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[TensorRT-LLM][WARNING] Fall back to unfused MHA because of cross attention.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/5/ELEMENTWISE_SUM_1_output_0 and DecoderModel/decoder_layers/5/SHUFFLE_2_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/5/mlp/SHUFFLE_0_output_0 and DecoderModel/decoder_layers/5/mlp/fc/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/5/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/5/mlp/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/5/mlp/SHUFFLE_2_output_0 and DecoderModel/decoder_layers/5/mlp/ELEMENTWISE_POW_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/5/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/5/mlp/ELEMENTWISE_PROD_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/5/mlp/SHUFFLE_3_output_0 and DecoderModel/decoder_layers/5/mlp/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/5/ELEMENTWISE_PROD_2_output_0 and DecoderModel/decoder_layers/5/mlp/proj/ELEMENTWISE_SUM_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/5/ELEMENTWISE_SUM_2_output_0 and DecoderModel/decoder_layers/6/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/6/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/6/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[TensorRT-LLM][WARNING] Fall back to unfused MHA because of cross attention.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/6/ELEMENTWISE_SUM_1_output_0 and DecoderModel/decoder_layers/6/SHUFFLE_2_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/6/mlp/SHUFFLE_0_output_0 and DecoderModel/decoder_layers/6/mlp/fc/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/6/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/6/mlp/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/6/mlp/SHUFFLE_2_output_0 and DecoderModel/decoder_layers/6/mlp/ELEMENTWISE_POW_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/6/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/6/mlp/ELEMENTWISE_PROD_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/6/mlp/SHUFFLE_3_output_0 and DecoderModel/decoder_layers/6/mlp/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/6/ELEMENTWISE_PROD_2_output_0 and DecoderModel/decoder_layers/6/mlp/proj/ELEMENTWISE_SUM_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/6/ELEMENTWISE_SUM_2_output_0 and DecoderModel/decoder_layers/7/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/7/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/7/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[TensorRT-LLM][WARNING] Fall back to unfused MHA because of cross attention.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/7/ELEMENTWISE_SUM_1_output_0 and DecoderModel/decoder_layers/7/SHUFFLE_2_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/7/mlp/SHUFFLE_0_output_0 and DecoderModel/decoder_layers/7/mlp/fc/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/7/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/7/mlp/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/7/mlp/SHUFFLE_2_output_0 and DecoderModel/decoder_layers/7/mlp/ELEMENTWISE_POW_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/7/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/7/mlp/ELEMENTWISE_PROD_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/7/mlp/SHUFFLE_3_output_0 and DecoderModel/decoder_layers/7/mlp/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/7/ELEMENTWISE_PROD_2_output_0 and DecoderModel/decoder_layers/7/mlp/proj/ELEMENTWISE_SUM_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/7/ELEMENTWISE_SUM_2_output_0 and DecoderModel/decoder_layers/8/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/8/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/8/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[TensorRT-LLM][WARNING] Fall back to unfused MHA because of cross attention.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/8/ELEMENTWISE_SUM_1_output_0 and DecoderModel/decoder_layers/8/SHUFFLE_2_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/8/mlp/SHUFFLE_0_output_0 and DecoderModel/decoder_layers/8/mlp/fc/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/8/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/8/mlp/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/8/mlp/SHUFFLE_2_output_0 and DecoderModel/decoder_layers/8/mlp/ELEMENTWISE_POW_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/8/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/8/mlp/ELEMENTWISE_PROD_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/8/mlp/SHUFFLE_3_output_0 and DecoderModel/decoder_layers/8/mlp/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/8/ELEMENTWISE_PROD_2_output_0 and DecoderModel/decoder_layers/8/mlp/proj/ELEMENTWISE_SUM_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/8/ELEMENTWISE_SUM_2_output_0 and DecoderModel/decoder_layers/9/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/9/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/9/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[TensorRT-LLM][WARNING] Fall back to unfused MHA because of cross attention.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/9/ELEMENTWISE_SUM_1_output_0 and DecoderModel/decoder_layers/9/SHUFFLE_2_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/9/mlp/SHUFFLE_0_output_0 and DecoderModel/decoder_layers/9/mlp/fc/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/9/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/9/mlp/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/9/mlp/SHUFFLE_2_output_0 and DecoderModel/decoder_layers/9/mlp/ELEMENTWISE_POW_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/9/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/9/mlp/ELEMENTWISE_PROD_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/9/mlp/SHUFFLE_3_output_0 and DecoderModel/decoder_layers/9/mlp/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/9/ELEMENTWISE_PROD_2_output_0 and DecoderModel/decoder_layers/9/mlp/proj/ELEMENTWISE_SUM_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/9/ELEMENTWISE_SUM_2_output_0 and DecoderModel/decoder_layers/10/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/10/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/10/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[TensorRT-LLM][WARNING] Fall back to unfused MHA because of cross attention.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/10/ELEMENTWISE_SUM_1_output_0 and DecoderModel/decoder_layers/10/SHUFFLE_2_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/10/mlp/SHUFFLE_0_output_0 and DecoderModel/decoder_layers/10/mlp/fc/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/10/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/10/mlp/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/10/mlp/SHUFFLE_2_output_0 and DecoderModel/decoder_layers/10/mlp/ELEMENTWISE_POW_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/10/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/10/mlp/ELEMENTWISE_PROD_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/10/mlp/SHUFFLE_3_output_0 and DecoderModel/decoder_layers/10/mlp/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/10/ELEMENTWISE_PROD_2_output_0 and DecoderModel/decoder_layers/10/mlp/proj/ELEMENTWISE_SUM_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/10/ELEMENTWISE_SUM_2_output_0 and DecoderModel/decoder_layers/11/SHUFFLE_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/11/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/11/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[TensorRT-LLM][WARNING] Fall back to unfused MHA because of cross attention.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/11/ELEMENTWISE_SUM_1_output_0 and DecoderModel/decoder_layers/11/SHUFFLE_2_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/11/mlp/SHUFFLE_0_output_0 and DecoderModel/decoder_layers/11/mlp/fc/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/11/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/11/mlp/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/11/mlp/SHUFFLE_2_output_0 and DecoderModel/decoder_layers/11/mlp/ELEMENTWISE_POW_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/11/mlp/fc/ELEMENTWISE_SUM_0_output_0 and DecoderModel/decoder_layers/11/mlp/ELEMENTWISE_PROD_1_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/11/mlp/SHUFFLE_3_output_0 and DecoderModel/decoder_layers/11/mlp/ELEMENTWISE_SUM_0_output_0: first input has type Float but second input has type Half.
[09/17/2024-07:08:07] [TRT] [W] IElementWiseLayer with inputs DecoderModel/decoder_layers/11/ELEMENTWISE_PROD_2_output_0 and DecoderModel/decoder_layers/11/mlp/proj/ELEMENTWISE_SUM_0_output_0: first input has type Half but second input has type Float.
[09/17/2024-07:08:07] [TRT-LLM] [I] Build TensorRT engine Unnamed Network 0
[09/17/2024-07:08:07] [TRT] [W] Unused Input: cross_kv_cache_gen
[09/17/2024-07:08:07] [TRT] [W] [RemoveDeadLayers] Input Tensor cross_kv_cache_gen is unused or used only at compile-time, but is not being removed.
[09/17/2024-07:08:07] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 3015, GPU 17381 (MiB)
[09/17/2024-07:08:07] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 3015, GPU 17389 (MiB)
[09/17/2024-07:08:07] [TRT] [W] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.2
[09/17/2024-07:08:07] [TRT] [I] Global timing cache in use. Profiling results in this builder pass will be stored.
[09/17/2024-07:08:17] [TRT] [I] [GraphReduction] The approximate region cut reduction algorithm is called.
[09/17/2024-07:08:17] [TRT] [I] Detected 38 inputs and 25 output network tensors.
[09/17/2024-07:08:17] [TRT] [I] Total Host Persistent Memory: 51472
[09/17/2024-07:08:17] [TRT] [I] Total Device Persistent Memory: 0
[09/17/2024-07:08:17] [TRT] [I] Total Scratch Memory: 232116992
[09/17/2024-07:08:17] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 350 steps to complete.
[09/17/2024-07:08:17] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 14.0794ms to assign 16 blocks to 350 nodes requiring 459499520 bytes.
[09/17/2024-07:08:17] [TRT] [I] Total Activation Memory: 459497984
[09/17/2024-07:08:17] [TRT] [I] Total Weights Memory: 386860032
[09/17/2024-07:08:17] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 3067, GPU 17771 (MiB)
[09/17/2024-07:08:17] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 3067, GPU 17779 (MiB)
[09/17/2024-07:08:17] [TRT] [W] TensorRT was linked against cuDNN 8.9.6 but loaded cuDNN 8.9.2
[09/17/2024-07:08:17] [TRT] [I] Engine generation completed in 10.4557 seconds.
[09/17/2024-07:08:17] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 153 MiB, GPU 1126 MiB
[09/17/2024-07:08:17] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +369, now: CPU 0, GPU 369 (MiB)
[09/17/2024-07:08:17] [TRT] [I] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 5244 MiB
[09/17/2024-07:08:18] [TRT-LLM] [I] Total time of building Unnamed Network 0: 00:00:10
[09/17/2024-07:08:18] [TRT-LLM] [I] Config saved to whisper_small_en/decoder_config.json.
[09/17/2024-07:08:18] [TRT-LLM] [I] Serializing engine to whisper_small_en/whisper_decoder_float16_tp1_rank0.engine...
[09/17/2024-07:08:18] [TRT-LLM] [I] Engine serialized. Total time: 00:00:00
Whisper small.en TensorRT engine built.
=========================================
Model is located at: /app/TensorRT-LLM-examples/whisper/whisper_small_en
root@c3f2d94d2f68:/app# python3 run_server.py --port 9090 \
                      --backend tensorrt \
                      --trt_model_path "/app/TensorRT-LLM-examples/whisper/whisper_small_en"
[TensorRT-LLM] TensorRT-LLM version: 0.9.0
--2024-09-17 07:11:35--  https://github.com/snakers4/silero-vad/raw/v4.0/files/silero_vad.onnx
Resolving github.com (github.com)... 140.82.121.4
Connecting to github.com (github.com)|140.82.121.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/snakers4/silero-vad/v4.0/files/silero_vad.onnx [following]
--2024-09-17 07:11:36--  https://raw.githubusercontent.com/snakers4/silero-vad/v4.0/files/silero_vad.onnx
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.111.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1807522 (1.7M) [application/octet-stream]
Saving to: ‘/root/.cache/whisper-live/silero_vad.onnx’

/root/.cache/whisper-live/silero_vad.onnx       100%[=====================================================================================================>]   1.72M  --.-KB/s    in 0.09s   

2024-09-17 07:11:36 (19.7 MB/s) - ‘/root/.cache/whisper-live/silero_vad.onnx’ saved [1807522/1807522]

[c3f2d94d2f68:00045] *** Process received signal ***
[c3f2d94d2f68:00045] Signal: Segmentation fault (11)
[c3f2d94d2f68:00045] Signal code: Address not mapped (1)
[c3f2d94d2f68:00045] Failing at address: 0x18
[c3f2d94d2f68:00045] [ 0] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7efd9a0fe520]
[c3f2d94d2f68:00045] [ 1] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so(_ZN12tensorrt_llm4thop14TorchAllocator6mallocEmb+0x88)[0x7efc04ba9dc8]
[c3f2d94d2f68:00045] [ 2] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm6common10IAllocator8reMallocIiEEPT_S4_mb+0xb4)[0x7efc0e65f144]
[c3f2d94d2f68:00045] [ 3] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm6layers18DynamicDecodeLayerI6__halfE14allocateBufferEv+0x3f)[0x7efc0e662b7f]
[c3f2d94d2f68:00045] [ 4] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm6layers18DynamicDecodeLayerI6__halfE10initializeEv+0x1c6)[0x7efc0e664ca6]
[c3f2d94d2f68:00045] [ 5] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm6layers18DynamicDecodeLayerI6__halfEC2ERKNS_7runtime12DecodingModeEiiiiP11CUstream_stSt10shared_ptrINS_6common10IAllocatorEEP14cudaDevicePropSt8optionalIiESH_+0x230)[0x7efc0e665150]
[c3f2d94d2f68:00045] [ 6] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so(_ZN9torch_ext15FtDynamicDecodeI6__halfEC1Emmmmii+0x2ce)[0x7efc04b86fee]
[c3f2d94d2f68:00045] [ 7] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so(_ZN9torch_ext15DynamicDecodeOp14createInstanceEv+0x8a)[0x7efc04b6bdba]
[c3f2d94d2f68:00045] [ 8] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so(_ZN9torch_ext15DynamicDecodeOpC1EllllllN3c1010ScalarTypeE+0x84)[0x7efc04b6bf04]
[c3f2d94d2f68:00045] [ 9] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so(_ZNSt17_Function_handlerIFvRSt6vectorIN3c106IValueESaIS2_EEEZN5torch6class_IN9torch_ext15DynamicDecodeOpEE12defineMethodIZNSB_3defIJllllllNS1_10ScalarTypeEEEERSB_NS7_6detail5typesIvJDpT_EEESsSt16initializer_listINS7_3argEEEUlNS1_14tagged_capsuleISA_EEllllllSE_E_EEPNS7_3jit8FunctionESsT_SsSN_EUlS5_E_E9_M_invokeERKSt9_Any_dataS5_+0xf8)[0x7efc04b873f8]
[c3f2d94d2f68:00045] [10] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so(+0xa0f34e)[0x7efd982de34e]
[c3f2d94d2f68:00045] [11] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so(+0xa0c8df)[0x7efd982db8df]
[c3f2d94d2f68:00045] [12] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so(+0xa0e929)[0x7efd982dd929]
[c3f2d94d2f68:00045] [13] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so(+0x47de04)[0x7efd97d4ce04]
[c3f2d94d2f68:00045] [14] python3(+0x15adae)[0x55a567c1fdae]
[c3f2d94d2f68:00045] [15] python3(_PyObject_MakeTpCall+0x25b)[0x55a567c1652b]
[c3f2d94d2f68:00045] [16] python3(+0x169680)[0x55a567c2e680]
[c3f2d94d2f68:00045] [17] python3(+0x28139b)[0x55a567d4639b]
[c3f2d94d2f68:00045] [18] python3(_PyObject_MakeTpCall+0x25b)[0x55a567c1652b]
[c3f2d94d2f68:00045] [19] python3(_PyEval_EvalFrameDefault+0x6f0b)[0x55a567c0f16b]
[c3f2d94d2f68:00045] [20] python3(_PyFunction_Vectorcall+0x7c)[0x55a567c206ac]
[c3f2d94d2f68:00045] [21] python3(_PyObject_FastCallDictTstate+0x16d)[0x55a567c1576d]
[c3f2d94d2f68:00045] [22] python3(+0x1657a4)[0x55a567c2a7a4]
[c3f2d94d2f68:00045] [23] python3(_PyObject_MakeTpCall+0x1fc)[0x55a567c164cc]
[c3f2d94d2f68:00045] [24] python3(_PyEval_EvalFrameDefault+0x7611)[0x55a567c0f871]
[c3f2d94d2f68:00045] [25] python3(_PyFunction_Vectorcall+0x7c)[0x55a567c206ac]
[c3f2d94d2f68:00045] [26] python3(_PyEval_EvalFrameDefault+0x8cb)[0x55a567c08b2b]
[c3f2d94d2f68:00045] [27] python3(_PyFunction_Vectorcall+0x7c)[0x55a567c206ac]
[c3f2d94d2f68:00045] [28] python3(_PyObject_FastCallDictTstate+0x16d)[0x55a567c1576d]
[c3f2d94d2f68:00045] [29] python3(+0x1657a4)[0x55a567c2a7a4]
[c3f2d94d2f68:00045] *** End of error message ***
Segmentation fault (core dumped)

@skinnynpale
Copy link

skinnynpale commented Sep 17, 2024

same..

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.107.02             Driver Version: 550.107.02     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        On  |   00000000:82:00.0 Off |                  Off |
| 30%   25C    P8              8W /  450W |       1MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

@makaveli10
Copy link
Collaborator

makaveli10 commented Sep 18, 2024

#276 should resolve this.

@skinnynpale
Copy link

#276 should resolve this.

can you please update ghcr.io/collabora/whisperlive-tensorrt:latest? because there are still the same old problems there

@matuszelenak
Copy link
Author

#276 should resolve this.

Unfortunately, does not seem like it.

(base) whiskas@debian-gpu:~$ docker run -p 9090:9090 --runtime=nvidia --gpus all --entrypoint /bin/bash -it whisper-live-trt:latest
root@7f39b90ea7e6:/app# nvidia-smi
Thu Sep 19 11:45:35 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        On  |   00000000:01:00.0 Off |                  N/A |
|  0%   36C    P8             12W /  420W |       4MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
root@7f39b90ea7e6:/app# bash build_whisper_tensorrt.sh /app/TensorRT-LLM-examples small.en
Looking in indexes: https://pypi.org/simple, https://pypi.nvidia.com
Requirement already satisfied: tensorrt_llm==0.10.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 2)) (0.10.0)
Requirement already satisfied: tiktoken in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 3)) (0.3.3)
Requirement already satisfied: datasets in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 4)) (3.0.0)
Requirement already satisfied: kaldialign in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 5)) (0.9.1)
Requirement already satisfied: openai-whisper in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 6)) (20231117)
Collecting librosa
  Downloading librosa-0.10.2.post1-py3-none-any.whl (260 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 260.1/260.1 KB 1.9 MB/s eta 0:00:00
Requirement already satisfied: soundfile in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 8)) (0.12.1)
Requirement already satisfied: safetensors in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 9)) (0.4.5)
Requirement already satisfied: transformers in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 10)) (4.40.2)
Requirement already satisfied: janus in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 11)) (1.0.0)
Installing collected packages: librosa
Successfully installed librosa-0.10.2.post1
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Downloading small.en...
--2024-09-19 11:46:00--  https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt
Resolving openaipublic.azureedge.net (openaipublic.azureedge.net)... 13.107.253.67, 2620:1ec:29:1::67
Connecting to openaipublic.azureedge.net (openaipublic.azureedge.net)|13.107.253.67|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 483615683 (461M) [application/octet-stream]
Saving to: 'assets/small.en.pt'

small.en.pt                                      100%[=======================================================================================================>] 461.21M  16.6MB/s    in 20s     

2024-09-19 11:46:20 (23.0 MB/s) - 'assets/small.en.pt' saved [483615683/483615683]

Download completed: small.en.pt
whisper_small_en
Running build script for small.en with output directory whisper_small_en
[TensorRT-LLM] TensorRT-LLM version: 0.10.0
[09/19/2024-11:46:22] [TRT-LLM] [I] plugin_arg is None, setting it as float16 automatically.
[09/19/2024-11:46:22] [TRT-LLM] [I] plugin_arg is None, setting it as float16 automatically.
[09/19/2024-11:46:22] [TRT-LLM] [I] plugin_arg is None, setting it as float16 automatically.
[09/19/2024-11:46:23] [TRT] [I] [MemUsageChange] Init CUDA: CPU +14, GPU +0, now: CPU 594, GPU 263 (MiB)
[09/19/2024-11:46:24] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +2132, GPU +396, now: CPU 2882, GPU 659 (MiB)
[09/19/2024-11:46:24] [TRT] [W] profileSharing0806 is on by default in TensorRT 10.0. This flag is deprecated and has no effect.

...

[09/19/2024-11:46:53] [TRT] [I] Total Weights Memory: 386860032 bytes
[09/19/2024-11:46:53] [TRT] [I] Compiler backend is used during engine execution.
[09/19/2024-11:46:53] [TRT] [I] Engine generation completed in 7.84096 seconds.
[09/19/2024-11:46:53] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 153 MiB, GPU 1126 MiB
[09/19/2024-11:46:53] [TRT] [I] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 5786 MiB
[09/19/2024-11:46:53] [TRT-LLM] [I] Total time of building Unnamed Network 0: 00:00:07
[09/19/2024-11:46:53] [TRT-LLM] [I] Config saved to whisper_small_en/decoder_config.json.
[09/19/2024-11:46:53] [TRT-LLM] [I] Serializing engine to whisper_small_en/whisper_decoder_float16_tp1_rank0.engine...
[09/19/2024-11:46:53] [TRT-LLM] [I] Engine serialized. Total time: 00:00:00
Whisper small.en TensorRT engine built.
=========================================
Model is located at: /app/TensorRT-LLM-examples/whisper/whisper_small_en
root@7f39b90ea7e6:/app# python3 run_server.py --port 9090 \
                      --backend tensorrt \
                      --trt_model_path "/app/TensorRT-LLM-examples/whisper/whisper_small_en"
[TensorRT-LLM] TensorRT-LLM version: 0.10.0
--2024-09-19 11:47:44--  https://github.com/snakers4/silero-vad/raw/v4.0/files/silero_vad.onnx
Resolving github.com (github.com)... 140.82.121.3
Connecting to github.com (github.com)|140.82.121.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/snakers4/silero-vad/v4.0/files/silero_vad.onnx [following]
--2024-09-19 11:47:44--  https://raw.githubusercontent.com/snakers4/silero-vad/v4.0/files/silero_vad.onnx
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.108.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1807522 (1.7M) [application/octet-stream]
Saving to: ‘/root/.cache/whisper-live/silero_vad.onnx’

/root/.cache/whisper-live/silero_vad.onnx        100%[=======================================================================================================>]   1.72M  --.-KB/s    in 0.09s   

2024-09-19 11:47:45 (19.3 MB/s) - ‘/root/.cache/whisper-live/silero_vad.onnx’ saved [1807522/1807522]

[7f39b90ea7e6:00362] *** Process received signal ***
[7f39b90ea7e6:00362] Signal: Segmentation fault (11)
[7f39b90ea7e6:00362] Signal code: Address not mapped (1)
[7f39b90ea7e6:00362] Failing at address: 0x18
[7f39b90ea7e6:00362] [ 0] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f27331b6520]
[7f39b90ea7e6:00362] [ 1] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so(_ZN12tensorrt_llm4thop14TorchAllocator6mallocEmb+0x88)[0x7f251c570d58]
[7f39b90ea7e6:00362] [ 2] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm6layers18DynamicDecodeLayerI6__halfE14allocateBufferEv+0xd4)[0x7f253570f434]
[7f39b90ea7e6:00362] [ 3] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm6layers18DynamicDecodeLayerI6__halfE10initializeEv+0x128)[0x7f2535713108]
[7f39b90ea7e6:00362] [ 4] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm6layers18DynamicDecodeLayerI6__halfEC2ERKNS_7runtime12DecodingModeERKNS0_13DecoderDomainEP11CUstream_stSt10shared_ptrINS_6common10IAllocatorEE+0xb1)[0x7f2535713311]
[7f39b90ea7e6:00362] [ 5] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so(_ZN9torch_ext15FtDynamicDecodeI6__halfEC1Emmmmii+0x270)[0x7f251c550c70]
[7f39b90ea7e6:00362] [ 6] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so(_ZN9torch_ext15DynamicDecodeOp14createInstanceEv+0x8a)[0x7f251c5340ca]
[7f39b90ea7e6:00362] [ 7] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so(_ZN9torch_ext15DynamicDecodeOpC1EllllllN3c1010ScalarTypeE+0x84)[0x7f251c534214]
[7f39b90ea7e6:00362] [ 8] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libth_common.so(_ZNSt17_Function_handlerIFvRSt6vectorIN3c106IValueESaIS2_EEEZN5torch6class_IN9torch_ext15DynamicDecodeOpEE12defineMethodIZNSB_3defIJllllllNS1_10ScalarTypeEEEERSB_NS7_6detail5typesIvJDpT_EEESsSt16initializer_listINS7_3argEEEUlNS1_14tagged_capsuleISA_EEllllllSE_E_EEPNS7_3jit8FunctionESsT_SsSN_EUlS5_E_E9_M_invokeERKSt9_Any_dataS5_+0xf8)[0x7f251c551058]
[7f39b90ea7e6:00362] [ 9] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so(+0xa0f34e)[0x7f27312de34e]
[7f39b90ea7e6:00362] [10] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so(+0xa0c8df)[0x7f27312db8df]
[7f39b90ea7e6:00362] [11] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so(+0xa0e929)[0x7f27312dd929]
[7f39b90ea7e6:00362] [12] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_python.so(+0x47de04)[0x7f2730d4ce04]
[7f39b90ea7e6:00362] [13] python3(+0x15cb2e)[0x55af3c364b2e]
[7f39b90ea7e6:00362] [14] python3(_PyObject_MakeTpCall+0x25b)[0x55af3c35b2db]
[7f39b90ea7e6:00362] [15] python3(+0x16b6b0)[0x55af3c3736b0]
[7f39b90ea7e6:00362] [16] python3(+0x2826fb)[0x55af3c48a6fb]
[7f39b90ea7e6:00362] [17] python3(_PyObject_MakeTpCall+0x25b)[0x55af3c35b2db]
[7f39b90ea7e6:00362] [18] python3(_PyEval_EvalFrameDefault+0x6b17)[0x55af3c353d27]
[7f39b90ea7e6:00362] [19] python3(_PyFunction_Vectorcall+0x7c)[0x55af3c36542c]
[7f39b90ea7e6:00362] [20] python3(_PyObject_FastCallDictTstate+0x16d)[0x55af3c35a51d]
[7f39b90ea7e6:00362] [21] python3(+0x1674b4)[0x55af3c36f4b4]
[7f39b90ea7e6:00362] [22] python3(_PyObject_MakeTpCall+0x1fc)[0x55af3c35b27c]
[7f39b90ea7e6:00362] [23] python3(_PyEval_EvalFrameDefault+0x72ea)[0x55af3c3544fa]
[7f39b90ea7e6:00362] [24] python3(_PyFunction_Vectorcall+0x7c)[0x55af3c36542c]
[7f39b90ea7e6:00362] [25] python3(_PyEval_EvalFrameDefault+0x8ab)[0x55af3c34dabb]
[7f39b90ea7e6:00362] [26] python3(_PyFunction_Vectorcall+0x7c)[0x55af3c36542c]
[7f39b90ea7e6:00362] [27] python3(_PyObject_FastCallDictTstate+0x16d)[0x55af3c35a51d]
[7f39b90ea7e6:00362] [28] python3(+0x1674b4)[0x55af3c36f4b4]
[7f39b90ea7e6:00362] [29] python3(_PyObject_MakeTpCall+0x1fc)[0x55af3c35b27c]
[7f39b90ea7e6:00362] *** End of error message ***
Segmentation fault (core dumped)

@Spudra
Copy link

Spudra commented Sep 20, 2024

I'm running into the exact same issue. Using a GeForce GTX 1050, if that matters.

@makaveli10
Copy link
Collaborator

Docker image updated on ghcr. Le us know if the issue still persists.

@skinnynpale
Copy link

skinnynpale commented Sep 21, 2024

Docker image updated on ghcr. Le us know if the issue still persists.

it works! thank you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants