Apple Macbook Pro M1 extremely slow #1446

rottitime · 2023-12-16T16:19:49Z

Issue

I have installed on Apple Macbook Pro 2021, M1.

It is taking over 15 to 30minutes to create each image. People claim on various forums it only takes 1 minute on the same device. Can anyone advice on how to speed up image creation?

Full Console Log

[Prompt Expansion] pikachu, cool color perfect colors, detailed, strong crisp, heroic, cinematic, dramatic, professional, symmetry, great composition, dynamic light, atmosphere, vivid, beautiful, emotional, highly detail, intricate, stunning, enhanced, inspired, colorful, shiny, transparent, lovely, cute, divine, elegant, coherent, pretty, best, novel, background, fine
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] pikachu, epic, beautiful, elegant, intricate, cinematic, highly detailed, artistic, sharp focus, colorful, surreal, dramatic ambient light, open background, magic, cute, adorable, magical, thought, extremely coherent, charismatic, iconic, creative, positive, awesome, joyful, pure, very inspirational, bright, friendly, glowing, clear, color, inspired
[Fooocus] Encoding positive #1 ...
[Fooocus] Encoding positive #2 ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (1280, 768)
Preparation time: 31.29 seconds
Using lcm scheduler.
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.39970141649246216, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 70.69 seconds
100%|█████████████████████████████████████████████| 8/8 [07:27<00:00, 55.99s/it]Requested to load AutoencoderKL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 1.86 seconds
Image generated with private log at: /Users/jaspaul/Public/repos/Fooocus/outputs/2023-12-16/log.html
Generating and saving time: 524.25 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.39970141649246216, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 65.45 seconds
100%|█████████████████████████████████████████████| 8/8 [07:35<00:00, 56.88s/it]
Requested to load AutoencoderKL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 2.09 seconds
Image generated with private log at: /Users/jaspaul/Public/repos/Fooocus/outputs/2023-12-16/log.html
Generating and saving time: 526.07 seconds
Total time: 1085.09 seconds

Setup

Device: Apple M1 MacBook Pro, OS: 14.2 (23C64)
Memory: 16 GB
Model: juggernautXL_version6Rundiffusion.safetensors
Conda 23.11.0
Python 3.11.5

MPS enabled

'Metal Performance Shaders' is enabled after I followed the Accelerated PyTorch training on Mac and I get the following output:

tensor([1.], device='mps:0')

## Settings:

Style

Model

Setting

The text was updated successfully, but these errors were encountered:

foreignstyle · 2023-12-16T19:46:30Z

Here's what I found regarding your problem. Let me know if it works!

Troubleshoot Error: "I am using Mac, the speed is very slow."

Some MAC users may need --disable-offload-from-vram to speed up model loading.

rottitime · 2023-12-16T20:29:18Z

Here's what I found regarding your problem. Let me know if it works!

Troubleshoot Error: "I am using Mac, the speed is very slow."

Some MAC users may need --disable-offload-from-vram to speed up model loading.

Thank you @foreignstyle for the suggestion. Didn't make any difference sadly

(fooocus) jaspaul@MacBook-Pro Fooocus % python entry_with_update.py --disable-offload-from-vram 
Fast-forward merge
Update succeeded.
[System ARGV] ['entry_with_update.py', '--disable-offload-from-vram']
Python 3.10.13 (main, Sep 11 2023, 08:16:02) [Clang 14.0.6 ]
Fooocus version: 2.1.844
Running on local URL:  http://127.0.0.1:7865

To create a public link, set `share=True` in `launch()`.
Total VRAM 16384 MB, total RAM 16384 MB
Set vram state to: SHARED
Device: mps
VAE dtype: torch.float32
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --attention-split
Refiner unloaded.
model_type EPS
UNet ADM Dimension 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale'}
left over keys: dict_keys(['cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids'])
Base model loaded: /Users/jaspaul/Public/repos/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors
Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [/Users/jaspaul/Public/repos/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors].
Loaded LoRA [/Users/jaspaul/Public/repos/Fooocus/models/loras/sd_xl_offset_example-lora_1.0.safetensors] for UNet [/Users/jaspaul/Public/repos/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cpu, use_fp16 = False.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865
Enter LCM mode.
[Fooocus] Downloading LCM components ...
[Parameters] Adaptive CFG = 1.0
[Parameters] Sharpness = 0.0
[Parameters] ADM Scale = 1.0 : 1.0 : 0.0
[Parameters] CFG = 1.0
[Parameters] Seed = 1100476089425728703
[Parameters] Sampler = lcm - lcm
[Parameters] Steps = 8 - 8
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
model_type EPS
UNet ADM Dimension 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale'}
left over keys: dict_keys(['cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids'])
Base model loaded: /Users/jaspaul/Public/repos/Fooocus/models/checkpoints/sd_xl_base_1.0.safetensors
Request to load LoRAs [['None', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0], ('sdxl_lcm_lora.safetensors', 1.0)] for model [/Users/jaspaul/Public/repos/Fooocus/models/checkpoints/sd_xl_base_1.0.safetensors].
Loaded LoRA [/Users/jaspaul/Public/repos/Fooocus/models/loras/sdxl_lcm_lora.safetensors] for UNet [/Users/jaspaul/Public/repos/Fooocus/models/checkpoints/sd_xl_base_1.0.safetensors] with 788 keys at weight 1.0.
Requested to load SDXLClipModel
Loading 1 new model
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] pikachu, glowing, shiny, bright, detailed, very intricate, cinematic, stunning, winning, highly colorful, deep colors, inspired, original, fine detail, enhanced, color, perfect, vibrant, symmetry, vivid, coherent, sharp focus, complex, extremely quality, futuristic, professional, creative, appealing, cheerful, amazing, atmosphere, directed, dramatic, thought
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] pikachu, vibrant, magic, vivid colors, intricate, elegant, highly detailed, professional, artistic, cinematic,, singular, clear, pristine, thoughtful, inspired, charismatic, beautiful, illuminated, pretty, attractive, colorful, best, dramatic, perfect, sharp focus, divine, amazing, astonishing, marvelous, flowing, enormous, luxury, very inspirational, cool
[Fooocus] Encoding positive #1 ...
[Fooocus] Encoding positive #2 ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (1280, 768)
Preparation time: 20.71 seconds
Using lcm scheduler.
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.39970141649246216, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 54.27 seconds
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [06:41<00:00, 50.13s/it]
Requested to load AutoencoderKL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 2.35 seconds
Image generated with private log at: /Users/jaspaul/Public/repos/Fooocus/outputs/2023-12-16/log.html
Generating and saving time: 461.61 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.39970141649246216, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 58.31 seconds
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [07:33<00:00, 56.64s/it]
Requested to load AutoencoderKL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 1.91 seconds
Image generated with private log at: /Users/jaspaul/Public/repos/Fooocus/outputs/2023-12-16/log.html
Generating and saving time: 516.92 seconds
Total time: 1001.73 seconds

maxrx215 · 2023-12-17T02:57:14Z

I also have the same configuration, increasing it by one to 30.40/s/it through the following command, but the improvement is not significant
python entry_with_update.py --always-cpu

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

eddyizm · 2023-12-22T19:56:56Z

Might be memory but it seems others have chimed in with the same set up. I have 64Gb and set to extreme speed, I pumps out like 4 renders in a few minutes.

99kpv · 2024-01-21T18:53:15Z

i have some problem

[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 8018492891930229499
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] Home-based typing, highly detailed, sharp focus, elegant, intricate, cinematic, new classic, epic composition, colorful, mystical, scenic, rich deep colors, inspired, illuminated, amazing, very inspirational, shiny, smart, thought inspiring, wonderful, dramatic, artistic, color, perfect, dynamic light, great, atmosphere, marvelous,, luxury, beautiful, gorgeous
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] Home-based typing, vivid colors, sharp focus, elegant, highly detailed, innocent, formal, cute, determined, color, cool, background, dramatic light, professional, charming, best, pretty, sunny, illuminated, attractive, beautiful, epic, stunning, gorgeous, breathtaking, creative, positive, artistic, loving, healthy, vibrant, passionate, lovely, relaxed
[Fooocus] Encoding positive #1 ...
[Fooocus] Encoding positive #2 ...
[Fooocus] Encoding negative #1 ...
[Fooocus] Encoding negative #2 ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (896, 1152)
Preparation time: 6.45 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 83.57 seconds
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
0%| | 0/30 [00:00<?, ?it/s]/Users/mac/Fooocus/modules/anisotropic.py:132: UserWarning: The operator 'aten::std_mean.correction' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:13.)
s, m = torch.std_mean(g, dim=(1, 2, 3), keepdim=True)
7%|███████▍ | 2/30 [05:19<1:14:48, 160.32s/it]

extremely slow estimated time 1:14:48

TattyDon · 2024-01-21T19:47:25Z

I can't claim credit for this. Someone else suggested using this and it speeds everything up hugely.

python entry_with_update.py --always-cpu --disable-offload-from-vram --unet-in-fp8-e5m2 --preset realistic

99kpv · 2024-01-22T07:41:04Z

I can't claim credit for this. Someone else suggested using this and it speeds everything up hugely.

python entry_with_update.py --always-cpu --disable-offload-from-vram --unet-in-fp8-e5m2 --preset realistic

Tnx bro! I've seen an improvement, but it's still far from desirable 36.39s/it

App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 3.0
[Parameters] Seed = 7339689169583121557
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] Home-based typing, attractive detailed, charming, delightful, professional, highly coherent, color excellent composition, dramatic calm intense cinematic light, beautiful detail, aesthetic, very inspirational, rich deep colors, inspired, lovely, cute, adorable, marvelous, intricate, epic, elegant, sharp focus, fabulous atmosphere, amazing, thought, iconic, perfect background, gorgeous, stunning, enormous
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] Home-based typing, highly detailed, sharp focus, cinematic, ambient, modern, structured, vivid, beautiful, expressive, pretty, attractive, classy, inspired, rich, color, illuminated, light, saturated, designed, deep clear, full, coherent, creative, positive, loving, vibrant, perfect, focused, lovely, cute, best, detail, bright, fabulous
[Fooocus] Encoding positive #1 ...
[Fooocus] Encoding positive #2 ...
[Fooocus] Encoding negative #1 ...
[Fooocus] Encoding negative #2 ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (1152, 896)
Preparation time: 7.58 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 10.77 seconds
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [17:58<00:00, 35.96s/it]
Requested to load AutoencoderKL
Loading 1 new model
Image generated with private log at: /Users/mac/Fooocus/outputs/2024-01-22/log.html
Generating and saving time: 1383.06 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [18:56<00:00, 37.90s/it]
Image generated with private log at: /Users/mac/Fooocus/outputs/2024-01-22/log.html
Generating and saving time: 1481.11 seconds
Total time: 2884.07 seconds

mlisovyi · 2024-02-11T19:20:19Z

I can't claim credit for this. Someone else suggested using this and it speeds everything up hugely.

python entry_with_update.py --always-cpu --disable-offload-from-vram --unet-in-fp8-e5m2 --preset realistic

Thanks @TattyDon .
Also with apple M1, 16 GB this reduced iteration time from original ~50 sec down to 10-12 sec instead of ~35 sec, as observed by @99kpv

eddyizm · 2024-02-24T05:43:01Z

I can't claim credit for this. Someone else suggested using this and it speeds everything up hugely.

python entry_with_update.py --always-cpu --disable-offload-from-vram --unet-in-fp8-e5m2 --preset realistic

Thanks @TattyDon .
Also with apple M1, 16 GB this reduced iteration time from original ~50 sec down to 10-12 sec instead of ~35 sec, as observed by @99kpv

Is that the only thing you changed or did you make any additional configuration changes? I feel like we should get a short discussion with all the best tips to improve performance on Apple silicon.

TattyDon · 2024-02-24T08:29:19Z

That's ok I have changed - it's down to about 20 seconds / I for me. Not perfect but also not unusable (M1)

originalmagneto · 2024-03-04T18:14:56Z

Has anyone here heard of Apple MLX? https://github.com/ml-explore/mlx

I'm tired of using these general purpose, NVIDIA oriented frameworks and seeing people on Apple Silicon be surprised that their computers are not performing as expected.

Someone should break the status quo and try implementing this framework into their LLMs and apps :D Cheers!

TorAllex · 2024-06-17T19:20:21Z

python entry_with_update.py --all-in-fp16 --attention-pytorch --disable-offload-from-vram --always-high-vram --gpu-device-id 0 --async-cuda-allocation --preset realistic
I have not M# silicon, my Mac is Intel based hackintosh

100%|██████████████████████████████████████████████████| 6/6 [01:40<00:00, 16.72s/it]
Requested to load AutoencoderKL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 1.07 seconds
[Fooocus] Saving image 1/1 to system ...
Image generated with private log at: /Users/alex/Fooocus/outputs/2024-06-17/log.html
Generating and saving time: 113.58 seconds
Total time: 119.55 seconds

yar4irus · 2024-07-28T16:20:07Z

Can you tell me where to put this script? I'm very bad at this.

python entry_with_update.py --all-in-fp16 --attention-pytorch --disable-offload-from-vram --always-high-vram --gpu-device-id 0 --async-cuda-allocation --preset realistic

TorAllex · 2024-07-28T22:15:03Z

Can you tell me where to put this script? I'm very bad at this.

python entry_with_update.py --all-in-fp16 --attention-pytorch --disable-offload-from-vram --always-high-vram --gpu-device-id 0 --async-cuda-allocation --preset realistic

Just type it in terminal

rottitime changed the title ~~Macbook Pro M1 Extremely slow~~ Apple Macbook Pro M1 extremely slow Dec 16, 2023

mashb1t added the help wanted Extra attention is needed label Dec 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apple Macbook Pro M1 extremely slow #1446

Apple Macbook Pro M1 extremely slow #1446

rottitime commented Dec 16, 2023 •

edited

Loading

foreignstyle commented Dec 16, 2023

rottitime commented Dec 16, 2023

maxrx215 commented Dec 17, 2023

eddyizm commented Dec 22, 2023

99kpv commented Jan 21, 2024

TattyDon commented Jan 21, 2024

99kpv commented Jan 22, 2024 •

edited

Loading

mlisovyi commented Feb 11, 2024

eddyizm commented Feb 24, 2024

TattyDon commented Feb 24, 2024

originalmagneto commented Mar 4, 2024

TorAllex commented Jun 17, 2024

yar4irus commented Jul 28, 2024

TorAllex commented Jul 28, 2024

Apple Macbook Pro M1 extremely slow #1446

Apple Macbook Pro M1 extremely slow #1446

Comments

rottitime commented Dec 16, 2023 • edited Loading

Issue

Setup

MPS enabled

Style

Model

Setting

foreignstyle commented Dec 16, 2023

rottitime commented Dec 16, 2023

maxrx215 commented Dec 17, 2023

eddyizm commented Dec 22, 2023

99kpv commented Jan 21, 2024

TattyDon commented Jan 21, 2024

99kpv commented Jan 22, 2024 • edited Loading

mlisovyi commented Feb 11, 2024

eddyizm commented Feb 24, 2024

TattyDon commented Feb 24, 2024

originalmagneto commented Mar 4, 2024

TorAllex commented Jun 17, 2024

yar4irus commented Jul 28, 2024

TorAllex commented Jul 28, 2024

rottitime commented Dec 16, 2023 •

edited

Loading

99kpv commented Jan 22, 2024 •

edited

Loading