We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full log (source lines truncated) with caching allocator on two CPX devices:
ROCR_VISIBLE_DEVICES=10,11 SHORTFIN_ALLOCATORS=caching SHORTFIN_AMDGPU_LOGICAL_DEVICES_PER_PHYSICAL_DEVICE=1 python -m shortfin_apps.sd.server --model_config=./python/shortfin_apps/sd/examples/sdxl_config_i8.json --device=amdgpu --fibers_per_device=4 --workers_per_device=1 --isolation="none" --flagfile=./python/shortfin_apps/sd/examples/sdxl_flags_gfx942.txt --build_preference=compile [2024-11-10 09:55:52.894] [info] Configure allocator amdgpu:0:0@0 = [caching] [2024-11-10 09:55:53.052] [info] Configure allocator amdgpu:1:0@0 = [caching] Servicing 4 outstanding tasks Completed BuildFile[gen](sdxl/stable_diffusion_xl_base_1_0_clip_dataset_fp16.irpa) Completed BuildFile[gen](sdxl/stable_diffusion_xl_base_1_0_clip_bs1_64_fp16.mlir) Completed BuildFile[bin](sdxl/stable_diffusion_xl_base_1_0_clip_bs1_64_fp16_amdgpu-gfx942.vmfb) Servicing 1 outstanding tasks Completed BuildEntrypoint(path='sdxl') Servicing 4 outstanding tasks Completed BuildFile[gen](sdxl/stable_diffusion_xl_base_1_0_punet_dataset_i8.irpa) Completed BuildFile[gen](sdxl/stable_diffusion_xl_base_1_0_punet_bs1_64_1024x1024_i8.mlir) Completed BuildFile[bin](sdxl/stable_diffusion_xl_base_1_0_punet_bs1_64_1024x1024_i8_amdgpu-gfx942.vmfb) Servicing 1 outstanding tasks Completed BuildEntrypoint(path='sdxl') Servicing 4 outstanding tasks Completed BuildFile[bin](sdxl/stable_diffusion_xl_base_1_0_vae_bs1_1024x1024_fp16_amdgpu-gfx942.vmfb) Completed BuildFile[gen](sdxl/stable_diffusion_xl_base_1_0_vae_bs1_1024x1024_fp16.mlir) Completed BuildFile[gen](sdxl/stable_diffusion_xl_base_1_0_vae_dataset_fp16.irpa) Servicing 1 outstanding tasks Completed BuildEntrypoint(path='sdxl') Servicing 3 outstanding tasks Completed BuildFile[bin](sdxl/stable_diffusion_xl_base_1_0_EulerDiscreteScheduler_bs1_1024x1024_fp16_amdgpu-gfx942.vmfb) Completed BuildFile[gen](sdxl/stable_diffusion_xl_base_1_0_EulerDiscreteScheduler_bs1_1024x1024_fp16.mlir) Servicing 1 outstanding tasks Completed BuildEntrypoint(path='sdxl') 2024-11-10 09:55:59 - INFO - Started server process [1297406] 2024-11-10 09:55:59 - INFO - Waiting for application startup. [2024-11-10 09:55:59.646] [info] [manager.py:60] Starting system manager [2024-11-10 09:55:59.942] [info] [manager.py:64] Shutting down system manager 2024-11-10 09:55:59 - ERROR - Traceback (most recent call last): File "/home/eagarvey/SHARK-Platform/.venv/lib/python3.12/site-packages/starlette/routing.py", line 693, in lifespan async with self.lifespan_context(app) as maybe_state: ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/contextlib.py", line 210, in __aenter__ return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/home/eagarvey/SHARK-Platform/shortfin/python/shortfin_apps/sd/server.py", line 51, in lifespan service.start() File "/home/eagarvey/SHARK-Platform/shortfin/python/shortfin_apps/sd/components/service.py", line 137, in start self.inference_programs[worker_idx][component] = sf.Program( ^^^^^^^^^^^ ValueError: iree/runtime/src/iree/hal/drivers/hip/event_semaphore.c:142: OUT_OF_RANGE; semaphore values must be monotonically increasing; current_value=2147483647, new_value=31; while invoking native function io_parameters.load; while calling import; [ 0] bytecode compiled_clip.__init:10116 [ genfiles/sdxl/stable_diffusion_xl_base_1_0_clip_bs1_64_fp16.mlir:2:3, genfiles/sdxl/stable_diffusion_xl_base_1_0_clip_bs1_64_fp16.mlir:3:3, genfiles/sdxl/stable_diffusion_xl_base_1_0_clip_bs1_64_fp16.mlir:4:3, <truncated> genfiles/sdxl/stable_diffusion_xl_base_1_0_clip_bs1_64_fp16.mlir:697:3, genfiles/sdxl/stable_diffusion_xl_base_1_0_clip_bs1_64_fp16.mlir:698:3 ] 2024-11-10 09:55:59 - ERROR - Application startup failed. Exiting. Segmentation fault (core dumped)
With async allocations enabled and default allocator, the process hangs on program load for unet:
ROCR_VISIBLE_DEVICES=10,11 SHORTFIN_AMDGPU_LOGICAL_DEVICES_PER_PHYSICAL_DEVICE=1 python -m shortfin_a pps.sd.server --model_config=./python/shortfin_apps/sd/examples/sdxl_config_i8.json --device=amdgpu --fibers_per_device=4 --workers_per_device=1 --isolation="none" --flagfil e=./python/shortfin_apps/sd/examples/sdxl_flags_gfx942.txt --build_preference=compile --amdgpu_async_allocations Servicing 4 outstanding tasks Completed BuildFile[bin](sdxl/stable_diffusion_xl_base_1_0_clip_bs1_64_fp16_amdgpu-gfx942.vmfb) Completed BuildFile[gen](sdxl/stable_diffusion_xl_base_1_0_clip_dataset_fp16.irpa) Completed BuildFile[gen](sdxl/stable_diffusion_xl_base_1_0_clip_bs1_64_fp16.mlir) Servicing 1 outstanding tasks Completed BuildEntrypoint(path='sdxl') Servicing 4 outstanding tasks Completed BuildFile[bin](sdxl/stable_diffusion_xl_base_1_0_punet_bs1_64_1024x1024_i8_amdgpu-gfx942.vmfb) Completed BuildFile[gen](sdxl/stable_diffusion_xl_base_1_0_punet_dataset_i8.irpa) Completed BuildFile[gen](sdxl/stable_diffusion_xl_base_1_0_punet_bs1_64_1024x1024_i8.mlir) Servicing 1 outstanding tasks Completed BuildEntrypoint(path='sdxl') Servicing 4 outstanding tasks Completed BuildFile[gen](sdxl/stable_diffusion_xl_base_1_0_vae_dataset_fp16.irpa) Completed BuildFile[bin](sdxl/stable_diffusion_xl_base_1_0_vae_bs1_1024x1024_fp16_amdgpu-gfx942.vmfb) Completed BuildFile[gen](sdxl/stable_diffusion_xl_base_1_0_vae_bs1_1024x1024_fp16.mlir) Servicing 1 outstanding tasks Completed BuildEntrypoint(path='sdxl') Servicing 3 outstanding tasks Completed BuildFile[gen](sdxl/stable_diffusion_xl_base_1_0_EulerDiscreteScheduler_bs1_1024x1024_fp16.mlir) Completed BuildFile[bin](sdxl/stable_diffusion_xl_base_1_0_EulerDiscreteScheduler_bs1_1024x1024_fp16_amdgpu-gfx942.vmfb) Servicing 1 outstanding tasks Completed BuildEntrypoint(path='sdxl') 2024-11-10 09:57:43 - INFO - Started server process [1298368] 2024-11-10 09:57:43 - INFO - Waiting for application startup. [2024-11-10 09:57:43.241] [info] [manager.py:60] Starting system manager
The text was updated successfully, but these errors were encountered:
Resolved by iree-org/iree#19103
Sorry, something went wrong.
No branches or pull requests
Full log (source lines truncated) with caching allocator on two CPX devices:
With async allocations enabled and default allocator, the process hangs on program load for unet:
The text was updated successfully, but these errors were encountered: