Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apple Macbook Pro M1 extremely slow #1446

Open
rottitime opened this issue Dec 16, 2023 · 14 comments
Open

Apple Macbook Pro M1 extremely slow #1446

rottitime opened this issue Dec 16, 2023 · 14 comments
Labels
help wanted Extra attention is needed

Comments

@rottitime
Copy link

rottitime commented Dec 16, 2023

Issue

I have installed on Apple Macbook Pro 2021, M1.

It is taking over 15 to 30minutes to create each image. People claim on various forums it only takes 1 minute on the same device. Can anyone advice on how to speed up image creation?

Full Console Log

[Prompt Expansion] pikachu, cool color perfect colors, detailed, strong crisp, heroic, cinematic, dramatic, professional, symmetry, great composition, dynamic light, atmosphere, vivid, beautiful, emotional, highly detail, intricate, stunning, enhanced, inspired, colorful, shiny, transparent, lovely, cute, divine, elegant, coherent, pretty, best, novel, background, fine
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] pikachu, epic, beautiful, elegant, intricate, cinematic, highly detailed, artistic, sharp focus, colorful, surreal, dramatic ambient light, open background, magic, cute, adorable, magical, thought, extremely coherent, charismatic, iconic, creative, positive, awesome, joyful, pure, very inspirational, bright, friendly, glowing, clear, color, inspired
[Fooocus] Encoding positive #1 ...
[Fooocus] Encoding positive #2 ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (1280, 768)
Preparation time: 31.29 seconds
Using lcm scheduler.
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.39970141649246216, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 70.69 seconds
100%|█████████████████████████████████████████████| 8/8 [07:27<00:00, 55.99s/it]Requested to load AutoencoderKL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 1.86 seconds
Image generated with private log at: /Users/jaspaul/Public/repos/Fooocus/outputs/2023-12-16/log.html
Generating and saving time: 524.25 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.39970141649246216, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 65.45 seconds
100%|█████████████████████████████████████████████| 8/8 [07:35<00:00, 56.88s/it]
Requested to load AutoencoderKL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 2.09 seconds
Image generated with private log at: /Users/jaspaul/Public/repos/Fooocus/outputs/2023-12-16/log.html
Generating and saving time: 526.07 seconds
Total time: 1085.09 seconds

Setup

Device: Apple M1 MacBook Pro, OS: 14.2 (23C64)
Memory: 16 GB
Model: juggernautXL_version6Rundiffusion.safetensors
Conda 23.11.0
Python 3.11.5

MPS enabled

'Metal Performance Shaders' is enabled after I followed the Accelerated PyTorch training on Mac and I get the following output:

tensor([1.], device='mps:0')

## Settings:

Style

Screenshot 2023-12-16 at 16 16 39

Model

Screenshot 2023-12-16 at 16 16 30

Setting

Screenshot 2023-12-16 at 16 16 20
@rottitime rottitime changed the title Macbook Pro M1 Extremely slow Apple Macbook Pro M1 extremely slow Dec 16, 2023
@foreignstyle
Copy link

Here's what I found regarding your problem. Let me know if it works!

Troubleshoot Error: "I am using Mac, the speed is very slow."

Some MAC users may need --disable-offload-from-vram to speed up model loading.

@rottitime
Copy link
Author

Here's what I found regarding your problem. Let me know if it works!

Troubleshoot Error: "I am using Mac, the speed is very slow."

Some MAC users may need --disable-offload-from-vram to speed up model loading.

Thank you @foreignstyle for the suggestion. Didn't make any difference sadly

(fooocus) jaspaul@MacBook-Pro Fooocus % python entry_with_update.py --disable-offload-from-vram 
Fast-forward merge
Update succeeded.
[System ARGV] ['entry_with_update.py', '--disable-offload-from-vram']
Python 3.10.13 (main, Sep 11 2023, 08:16:02) [Clang 14.0.6 ]
Fooocus version: 2.1.844
Running on local URL:  http://127.0.0.1:7865

To create a public link, set `share=True` in `launch()`.
Total VRAM 16384 MB, total RAM 16384 MB
Set vram state to: SHARED
Device: mps
VAE dtype: torch.float32
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --attention-split
Refiner unloaded.
model_type EPS
UNet ADM Dimension 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale'}
left over keys: dict_keys(['cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids'])
Base model loaded: /Users/jaspaul/Public/repos/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors
Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [/Users/jaspaul/Public/repos/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors].
Loaded LoRA [/Users/jaspaul/Public/repos/Fooocus/models/loras/sd_xl_offset_example-lora_1.0.safetensors] for UNet [/Users/jaspaul/Public/repos/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cpu, use_fp16 = False.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865
Enter LCM mode.
[Fooocus] Downloading LCM components ...
[Parameters] Adaptive CFG = 1.0
[Parameters] Sharpness = 0.0
[Parameters] ADM Scale = 1.0 : 1.0 : 0.0
[Parameters] CFG = 1.0
[Parameters] Seed = 1100476089425728703
[Parameters] Sampler = lcm - lcm
[Parameters] Steps = 8 - 8
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
model_type EPS
UNet ADM Dimension 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale'}
left over keys: dict_keys(['cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids'])
Base model loaded: /Users/jaspaul/Public/repos/Fooocus/models/checkpoints/sd_xl_base_1.0.safetensors
Request to load LoRAs [['None', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0], ('sdxl_lcm_lora.safetensors', 1.0)] for model [/Users/jaspaul/Public/repos/Fooocus/models/checkpoints/sd_xl_base_1.0.safetensors].
Loaded LoRA [/Users/jaspaul/Public/repos/Fooocus/models/loras/sdxl_lcm_lora.safetensors] for UNet [/Users/jaspaul/Public/repos/Fooocus/models/checkpoints/sd_xl_base_1.0.safetensors] with 788 keys at weight 1.0.
Requested to load SDXLClipModel
Loading 1 new model
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] pikachu, glowing, shiny, bright, detailed, very intricate, cinematic, stunning, winning, highly colorful, deep colors, inspired, original, fine detail, enhanced, color, perfect, vibrant, symmetry, vivid, coherent, sharp focus, complex, extremely quality, futuristic, professional, creative, appealing, cheerful, amazing, atmosphere, directed, dramatic, thought
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] pikachu, vibrant, magic, vivid colors, intricate, elegant, highly detailed, professional, artistic, cinematic,, singular, clear, pristine, thoughtful, inspired, charismatic, beautiful, illuminated, pretty, attractive, colorful, best, dramatic, perfect, sharp focus, divine, amazing, astonishing, marvelous, flowing, enormous, luxury, very inspirational, cool
[Fooocus] Encoding positive #1 ...
[Fooocus] Encoding positive #2 ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (1280, 768)
Preparation time: 20.71 seconds
Using lcm scheduler.
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.39970141649246216, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 54.27 seconds
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [06:41<00:00, 50.13s/it]
Requested to load AutoencoderKL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 2.35 seconds
Image generated with private log at: /Users/jaspaul/Public/repos/Fooocus/outputs/2023-12-16/log.html
Generating and saving time: 461.61 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.39970141649246216, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 58.31 seconds
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [07:33<00:00, 56.64s/it]
Requested to load AutoencoderKL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 1.91 seconds
Image generated with private log at: /Users/jaspaul/Public/repos/Fooocus/outputs/2023-12-16/log.html
Generating and saving time: 516.92 seconds
Total time: 1001.73 seconds

@maxrx215
Copy link

I also have the same configuration, increasing it by one to 30.40/s/it through the following command, but the improvement is not significant
python entry_with_update.py --always-cpu

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

@eddyizm
Copy link
Contributor

eddyizm commented Dec 22, 2023

Might be memory but it seems others have chimed in with the same set up. I have 64Gb and set to extreme speed, I pumps out like 4 renders in a few minutes.

@mashb1t mashb1t added the help wanted Extra attention is needed label Dec 28, 2023
@99kpv
Copy link

99kpv commented Jan 21, 2024

i have some problem

[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 8018492891930229499
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] Home-based typing, highly detailed, sharp focus, elegant, intricate, cinematic, new classic, epic composition, colorful, mystical, scenic, rich deep colors, inspired, illuminated, amazing, very inspirational, shiny, smart, thought inspiring, wonderful, dramatic, artistic, color, perfect, dynamic light, great, atmosphere, marvelous,, luxury, beautiful, gorgeous
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] Home-based typing, vivid colors, sharp focus, elegant, highly detailed, innocent, formal, cute, determined, color, cool, background, dramatic light, professional, charming, best, pretty, sunny, illuminated, attractive, beautiful, epic, stunning, gorgeous, breathtaking, creative, positive, artistic, loving, healthy, vibrant, passionate, lovely, relaxed
[Fooocus] Encoding positive #1 ...
[Fooocus] Encoding positive #2 ...
[Fooocus] Encoding negative #1 ...
[Fooocus] Encoding negative #2 ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (896, 1152)
Preparation time: 6.45 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 83.57 seconds
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
0%| | 0/30 [00:00<?, ?it/s]/Users/mac/Fooocus/modules/anisotropic.py:132: UserWarning: The operator 'aten::std_mean.correction' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:13.)
s, m = torch.std_mean(g, dim=(1, 2, 3), keepdim=True)
7%|███████▍ | 2/30 [05:19<1:14:48, 160.32s/it]

extremely slow estimated time 1:14:48

@TattyDon
Copy link

I can't claim credit for this. Someone else suggested using this and it speeds everything up hugely.

python entry_with_update.py --always-cpu --disable-offload-from-vram --unet-in-fp8-e5m2 --preset realistic

@99kpv
Copy link

99kpv commented Jan 22, 2024

I can't claim credit for this. Someone else suggested using this and it speeds everything up hugely.

python entry_with_update.py --always-cpu --disable-offload-from-vram --unet-in-fp8-e5m2 --preset realistic

Tnx bro! I've seen an improvement, but it's still far from desirable 36.39s/it

App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 3.0
[Parameters] Seed = 7339689169583121557
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] Home-based typing, attractive detailed, charming, delightful, professional, highly coherent, color excellent composition, dramatic calm intense cinematic light, beautiful detail, aesthetic, very inspirational, rich deep colors, inspired, lovely, cute, adorable, marvelous, intricate, epic, elegant, sharp focus, fabulous atmosphere, amazing, thought, iconic, perfect background, gorgeous, stunning, enormous
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] Home-based typing, highly detailed, sharp focus, cinematic, ambient, modern, structured, vivid, beautiful, expressive, pretty, attractive, classy, inspired, rich, color, illuminated, light, saturated, designed, deep clear, full, coherent, creative, positive, loving, vibrant, perfect, focused, lovely, cute, best, detail, bright, fabulous
[Fooocus] Encoding positive #1 ...
[Fooocus] Encoding positive #2 ...
[Fooocus] Encoding negative #1 ...
[Fooocus] Encoding negative #2 ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (1152, 896)
Preparation time: 7.58 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 10.77 seconds
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [17:58<00:00, 35.96s/it]
Requested to load AutoencoderKL
Loading 1 new model
Image generated with private log at: /Users/mac/Fooocus/outputs/2024-01-22/log.html
Generating and saving time: 1383.06 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [18:56<00:00, 37.90s/it]
Image generated with private log at: /Users/mac/Fooocus/outputs/2024-01-22/log.html
Generating and saving time: 1481.11 seconds
Total time: 2884.07 seconds

@mlisovyi
Copy link

I can't claim credit for this. Someone else suggested using this and it speeds everything up hugely.

python entry_with_update.py --always-cpu --disable-offload-from-vram --unet-in-fp8-e5m2 --preset realistic

Thanks @TattyDon .
Also with apple M1, 16 GB this reduced iteration time from original ~50 sec down to 10-12 sec instead of ~35 sec, as observed by @99kpv

@eddyizm
Copy link
Contributor

eddyizm commented Feb 24, 2024

I can't claim credit for this. Someone else suggested using this and it speeds everything up hugely.

python entry_with_update.py --always-cpu --disable-offload-from-vram --unet-in-fp8-e5m2 --preset realistic

Thanks @TattyDon .
Also with apple M1, 16 GB this reduced iteration time from original ~50 sec down to 10-12 sec instead of ~35 sec, as observed by @99kpv

Is that the only thing you changed or did you make any additional configuration changes? I feel like we should get a short discussion with all the best tips to improve performance on Apple silicon.

@TattyDon
Copy link

That's ok I have changed - it's down to about 20 seconds / I for me. Not perfect but also not unusable (M1)

@originalmagneto
Copy link

Has anyone here heard of Apple MLX? https://github.com/ml-explore/mlx

I'm tired of using these general purpose, NVIDIA oriented frameworks and seeing people on Apple Silicon be surprised that their computers are not performing as expected.

Someone should break the status quo and try implementing this framework into their LLMs and apps :D Cheers!

@TorAllex
Copy link

python entry_with_update.py --all-in-fp16 --attention-pytorch --disable-offload-from-vram --always-high-vram --gpu-device-id 0 --async-cuda-allocation --preset realistic
I have not M# silicon, my Mac is Intel based hackintosh

100%|██████████████████████████████████████████████████| 6/6 [01:40<00:00, 16.72s/it]
Requested to load AutoencoderKL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 1.07 seconds
[Fooocus] Saving image 1/1 to system ...
Image generated with private log at: /Users/alex/Fooocus/outputs/2024-06-17/log.html
Generating and saving time: 113.58 seconds
Total time: 119.55 seconds

@yar4irus
Copy link

Can you tell me where to put this script? I'm very bad at this.

python entry_with_update.py --all-in-fp16 --attention-pytorch --disable-offload-from-vram --always-high-vram --gpu-device-id 0 --async-cuda-allocation --preset realistic

@TorAllex
Copy link

Can you tell me where to put this script? I'm very bad at this.

python entry_with_update.py --all-in-fp16 --attention-pytorch --disable-offload-from-vram --always-high-vram --gpu-device-id 0 --async-cuda-allocation --preset realistic

Just type it in terminal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests