Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault on linux with AMD GPU #1783

Closed
carnager opened this issue Jan 7, 2024 · 15 comments
Closed

Segfault on linux with AMD GPU #1783

carnager opened this issue Jan 7, 2024 · 15 comments
Labels
bug (AMD) Something isn't working (AMD specific)

Comments

@carnager
Copy link

carnager commented Jan 7, 2024

Read Troubleshoot

[x] I admit that I have read the Troubleshoot before making this issue.

Describe the problem
I installed fooocus on linux using the instructions on the main page. I uninstalled regular torch and installed the amd version as mentioned on front page. I created a 40GB swap space and then ran the app with python launch.py --attention-split
When i try to issue a image generation it seems to do something but then segfaults.

some info about my setup:

GPU:
2d:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 22 [Radeon RX 6700/6700 XT/6750 XT / 6800M/6850M XT] (rev c5)

Memory:
               total        used        free      shared  buff/cache   available
Mem:            31Gi       6,4Gi        12Gi       307Mi        13Gi        24Gi
Swap:           39Gi          0B        39Gi

CPU:
model name	: AMD Ryzen 9 5900X 12-Core Processor

Full Console Log

(fooocus_env) carnager@caprica ~/Apps/Fooocus > python launch.py --attention-split
[System ARGV] ['launch.py', '--attention-split']
Python 3.11.6 (main, Nov 14 2023, 09:36:21) [GCC 13.2.1 20230801]
Fooocus version: 2.1.860
Running on local URL:  http://127.0.0.1:7865

To create a public link, set `share=True` in `launch()`.
Total VRAM 12272 MB, total RAM 32018 MB
Set vram state to: NORMAL_VRAM
Always offload VRAM
Device: cuda:0 AMD Radeon RX 6700 XT : native
VAE dtype: torch.float32
Using split optimization for cross attention
Refiner unloaded.
model_type EPS
UNet ADM Dimension 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale'}
Base model loaded: /home/carnager/Apps/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors
Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [/home/carnager/Apps/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors].
Loaded LoRA [/home/carnager/Apps/Fooocus/models/loras/sd_xl_offset_example-lora_1.0.safetensors] for UNet [/home/carnager/Apps/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cuda:0, use_fp16 = True.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
[Fooocus Model Management] Moving model(s) has taken 0.52 seconds
App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 8714579776560103216
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
zsh: segmentation fault (core dumped)  python launch.py --attention-split
@mashb1t
Copy link
Collaborator

mashb1t commented Jan 7, 2024

Can you please check if it works without setting --attention-split (not setting any arguments)? Thanks!

@mashb1t mashb1t added bug Something isn't working question Further information is requested labels Jan 7, 2024
@carnager
Copy link
Author

carnager commented Jan 7, 2024

yeah, tried that already, same behavior without any arguments

@codeliger
Copy link

I have the same issue before and after creating the 40GB swap partition. It doesn't seem to be ram/memory related.

image

Full logs:

$ python launch.py 
[System ARGV] ['launch.py']
Python 3.11.5 (main, Sep 11 2023, 13:54:46) [GCC 11.2.0]
Fooocus version: 2.1.860
Running on local URL:  http://127.0.0.1:7865

To create a public link, set `share=True` in `launch()`.
amdgpu.ids: No such file or directory
amdgpu.ids: No such file or directory
Total VRAM 8176 MB, total RAM 32018 MB
Set vram state to: NORMAL_VRAM
Always offload VRAM
Device: cuda:0 AMD Radeon Graphics : native
VAE dtype: torch.float32
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --attention-split
Refiner unloaded.
model_type EPS
UNet ADM Dimension 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale'}
Base model loaded: /home/codeliger/dl/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors
Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [/home/codeliger/dl/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors].
Loaded LoRA [/home/codeliger/dl/Fooocus/models/loras/sd_xl_offset_example-lora_1.0.safetensors] for UNet [/home/codeliger/dl/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cuda:0, use_fp16 = True.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
[Fooocus Model Management] Moving model(s) has taken 0.52 seconds
App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 5323403996105043708
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
Segmentation fault (core dumped)

@darkraisisi
Copy link

darkraisisi commented Jan 7, 2024

I have a similar problem, it seems like the swap is not being used or found.
I am using an Nvidia 3090 but when forcing cpu only an error pops up about not finding virtual memory, i think this problem is related.

Running normally:

[System ARGV] ['launch.py']
Python 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]
Fooocus version: 2.1.860
Running on local URL:  http://127.0.0.1:7865

To create a public link, set `share=True` in `launch()`.
Total VRAM 24257 MB, total RAM 15912 MB
Set vram state to: NORMAL_VRAM
Always offload VRAM
Device: cuda:0 NVIDIA GeForce RTX 3090 : native
VAE dtype: torch.bfloat16
Using pytorch cross attention
Refiner unloaded.
model_type EPS
UNet ADM Dimension 2816
Using pytorch attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using pytorch attention in VAE
extra {'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_l.text_projection'}
Base model loaded: /home/myName/Documents/img-gen/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors
Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [/home/myName/Documents/img-gen/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors].
Loaded LoRA [/home/myName/Documents/img-gen/Fooocus/models/loras/sd_xl_offset_example-lora_1.0.safetensors] for UNet [/home/myName/Documents/img-gen/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cuda:0, use_fp16 = True.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
[Fooocus Model Management] Moving model(s) has taken 0.34 seconds
App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 5428285024980375409
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] brown horse on the beach, intricate, elegant, highly detailed, wonderful colors, sweet, extremely delicate, majestic, holy, dramatic, sharp focus, professional composition, fantastic, iconic, fine light, excellent, very inspirational, ambient, artistic, vibrant, imposing, epic, thought, magnificent, stunning, awesome, cinematic, dynamic, complex, amazing, creative, brilliant
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] brown horse on the beach, intricate, elegant, highly detailed, extremely shiny, wonderful colors, ambient light, dynamic background, sharp focus, professional fine detail, best animated, cinematic, singular, rich, vivid, beautiful, unique, cute, attractive, epic, gorgeous, stunning, great, awesome, amazing, breathtaking, dramatic, illuminated, outstanding, very coherent, perfect
[Fooocus] Encoding positive #1 ...
[Fooocus] Encoding positive #2 ...
[Fooocus] Encoding negative #1 ...
[Fooocus] Encoding negative #2 ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (896, 1152)
Preparation time: 2.52 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 1.93 seconds
  0%|                                                                                                                                                                  | 0/30 [00:00<?, ?it/s]
Segmentation fault (core dumped)

Running cpu only:

(fooocus) myName@pop-os:~/Documents/img-gen/Fooocus$ python entry_with_update.py --preview-option fast --always-cpu
Already up-to-date
Update succeeded.
[System ARGV] ['entry_with_update.py', '--preview-option', 'fast', '--always-cpu']
Python 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]
Fooocus version: 2.1.860
Running on local URL:  http://127.0.0.1:7865

To create a public link, set `share=True` in `launch()`.
Total VRAM 15912 MB, total RAM 15912 MB
Set vram state to: DISABLED
Always offload VRAM
Device: cpu
VAE dtype: torch.float32
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --attention-split
Refiner unloaded.
model_type EPS
UNet ADM Dimension 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale'}
Base model loaded: /home/myName/Documents/img-gen/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors
Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [/home/myName/Documents/img-gen/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors].
Loaded LoRA [/home/myName/Documents/img-gen/Fooocus/models/loras/sd_xl_offset_example-lora_1.0.safetensors] for UNet [/home/myName/Documents/img-gen/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cpu, use_fp16 = False.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 8454048247502736915
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] brown horse on the beach, cinematic, epic, dramatic ambient, professional, highly detailed, extremely beautiful, emotional, cute, symmetry, intricate, light, surreal, pretty, inspiring, elegant, crisp sharp focus, artistic, very inspirational,, novel, romantic, new, cheerful, inspired, generous, color, cool, passionate, vibrant, background, colorful, shiny
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] brown horse on the beach, intricate, elegant, highly detailed, extremely beautiful, glowing, sharp focus, refined, complex, colors, cinematic, surreal, artistic, scenic, attractive, thought, singular, iconic, fine detail, clear, ambient light, full color, perfect composition, symmetry, aesthetic, great, pure, pristine, very inspirational, professional, winning, best
[Fooocus] Encoding positive #1 ...
[Fooocus] Encoding positive #2 ...
[Fooocus] Encoding negative #1 ...
[Fooocus] Encoding negative #2 ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (896, 1152)
Preparation time: 12.63 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 119.24 seconds
  0%|                                                                                                                                                                  | 0/30 [00:00<?, ?it/s]
/home/myName/anaconda3/envs/fooocus/lib/python3.10/site-packages/psutil/__init__.py:1973: RuntimeWarning: available memory stats couldn't be determined and was set to 0
  ret = _psplatform.virtual_memory()

  7%|██████████                                                                                                                                             | 2/30 [07:04<1:37:07, 208.12s/it]^CKeyboard interruption in main thread... closing server.

Nvidia-smi, driver & cuda versions. (which should be compatible with the current torch version.

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.06              Driver Version: 545.29.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3090        Off | 00000000:08:00.0  On |                  N/A |
|  0%   18C    P8              18W / 350W |    752MiB / 24576MiB |     10%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2320      G   /usr/lib/xorg/Xorg                          245MiB |
|    0   N/A  N/A      2430      G   /usr/bin/gnome-shell                        110MiB |
|    0   N/A  N/A      3106      G   ...sion,SpareRendererForSitePerProcess       52MiB |
|    0   N/A  N/A      3334      G   firefox                                     325MiB |
+---------------------------------------------------------------------------------------+

EDIT:
After downgrading drivers to 535.129.03 just to be sure the results remain the same.

I looked in the docs and in other issues for how to go about debugging this but that is not clear to me, i'd love to help contribute if there is some resources i can start with.

@mashb1t mashb1t removed the question Further information is requested label Jan 7, 2024
@Laurent-VueJS
Copy link

I have same issue. (Ryzen R9 7900X RAM 64 GB of which 16GB of VRAM for integrated GPU). When I run on linux I have the same issue exactly. On windows (I dual boot), it runs more or less OK (I have sometimes a crash because of the memory leak issue but it works). Could it be linked to the version of rocm that is different in the instructions of linux/AMD (5.6 instead of 5.7) ? I have read that it does not go well with VAE version (?) I tried a manual upgrade of ROCM but it caused other problems. I found interresting on the same subject: https://www.reddit.com/r/comfyui/comments/15b8lxd/comfyui_is_not_detecting_my_gpus_vram/

@cgerardin
Copy link

cgerardin commented Jan 11, 2024

Hello,
Same issue here.

  • Nobara Linux 39 (Kernel 6.6.9, Python 3.10 with venv)
  • AMD Ryzen 7 2700X
  • AMD ATI Radeon RX 6600 (8 Gb)
  • 48 Gb RAM
  • 85 Gb SWAP

Working with --always-cpu

Feel free to tell if I can do some test or provide more information.

@Athoir
Copy link

Athoir commented Jan 12, 2024

Hello, I had the same issue with the segfault on the following hardware:

  • Endeavouros (kernel 6.6.10-arch1-1)
  • AMD Ryzen 7 5800x
  • AMD ATI Radeon RX 7800 XT
  • 64GB ram
  • no swap

I managed to make it run doing the following:

I can't test if it works on the RX 6000 series as my previous card is fried.

Hope this helps 😄

@carnager
Copy link
Author

carnager commented Jan 12, 2024

sadly this does not work for me...

Memory access fault by GPU node-1 (Agent handle: 0x7f6c79b37c80) on address 0x7f6d60e85000. Reason: Page not present or supervisor privilege.

ok, could make it run with lowvram option, but it never finishes generation of any images

@cgerardin
Copy link

cgerardin commented Jan 12, 2024

Thank you @Athoir, but exactly same as @carnager, it run with --attention-split && --always-low-vram options, but fail shorty after the begining of the image generation :

Memory access fault by GPU node-1 (Agent handle: 0x7facdd668c60) on address 0x7fae2da8b000. Reason: Page not present or supervisor privilege.
Abandon (core dumped)

Complete steps to reproduce on Fedora / Nobara :

$ sudo dnf install python3.10 rocm-opencl rocm-hip-runtime
$ python3.10 -m venv fooocus_env
$ source fooocus_env/bin/activate
$ pip install -r requirements_versions.txt
$ pip uninstall torch torchvision torchaudio torchtext functorch xformers 
$ pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.6
$ HSA_OVERRIDE_GFX_VERSION=11.0.0 python entry_with_update.py --attention-split --always-low-vram

Perhaps related to torch version ? (see AUTOMATIC1111/stable-diffusion-webui#8139 (comment))

@OronDF343
Copy link

For RX 6700 XT, setting HSA_OVERRIDE_GFX_VERSION=10.3.0 helped, as mentioned here

@carnager
Copy link
Author

For RX 6700 XT, setting HSA_OVERRIDE_GFX_VERSION=10.3.0 helped, as mentioned here

not for me....

[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] cosy bear reading a book, warm colors, cinematic, highly detailed, incredible quality, very inspirational, thought, fancy, epic, singular background, elegant, intricate, dynamic light, beautiful, enhanced, bright, colorful, color, illuminated, inspired, deep rich vivid, coherent, glowing, complex, amazing, symmetry, full composed, brilliant, perfect composition, pure
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] cosy bear reading a book, light flowing magic, cool colors, glowing, amazing, highly detailed, intricate, sharp focus, professional animated, vivid, best, contemporary, modern, romantic, inspired, new, creative, beautiful, attractive, advanced, cinematic, artistic color, surreal, emotional, cute, adorable, perfect, focused, positive, exciting, lucid, joyful
[Fooocus] Encoding positive #1 ...
[Fooocus] Encoding positive #2 ...
[Fooocus] Encoding negative #1 ...
[Fooocus] Encoding negative #2 ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (896, 1152)
Preparation time: 5.71 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828

and then nothing happens

@cgerardin
Copy link

Working FAST with HSA_OVERRIDE_GFX_VERSION=10.3.0 (with and without --attention-split) !
Many thanks @OronDF343

@Senshi00
Copy link

Also RX 6700 XT user, using HSA_OVERRIDE_GFX_VERSION=10.3.0 helped

@TheNexter
Copy link

TheNexter commented Jan 17, 2024

Also RX 6700 XT user, using HSA_OVERRIDE_GFX_VERSION=10.3.0 helped

I confirm, 6600 XT, this solve the problem

@mashb1t mashb1t closed this as completed Feb 22, 2024
@mashb1t mashb1t added bug (AMD) Something isn't working (AMD specific) and removed bug Something isn't working labels Feb 22, 2024
@hqnicolas
Copy link

I'm Runing here without no problems using this GIST:
you need to flag HSA_OVERRIDE_GFX_VERSION=10.3.0 to Radeon 6000
https://gist.github.com/hqnicolas/5fbb9c37dcfc29c9a0ffe50fbcb35bdd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug (AMD) Something isn't working (AMD specific)
Projects
None yet
Development

No branches or pull requests