Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPS keeps crashing #68

Closed
enzyme69 opened this issue Feb 15, 2023 · 23 comments · Fixed by #143
Closed

MPS keeps crashing #68

enzyme69 opened this issue Feb 15, 2023 · 23 comments · Fixed by #143
Labels
bug Something isn't working

Comments

@enzyme69
Copy link

I got the ControlNet extension loading fine, but it keeps on crashing when I use scribble:

  0%|                                                    | 0/20 [00:00<?, ?it/s](mpsFileLoc): /AppleInternal/Library/BuildRoots/9e200cfa-7d96-11ed-886f-a23c4f261b56/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm:228:0: error: 'mps.add' op requires the same element type for all operands and results
(mpsFileLoc): /AppleInternal/Library/BuildRoots/9e200cfa-7d96-11ed-886f-a23c4f261b56/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm:228:0: note: see current operation: %5 = "mps.add"(%4, %arg2) : (tensor<2x1280xf32>, tensor<*xf16>) -> tensor<*xf32>
zsh: segmentation fault  ./webui.sh
/opt/homebrew/Cellar/[email protected]/3.10.10/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
@Mikubill Mikubill added the bug Something isn't working label Feb 15, 2023
@Philbuck84
Copy link

Getting the same error when I use any model

Loading preprocessor: depth, model: control_sd15_depth [fef5e48e]
Loaded state_dict from [/Users/philbuck/sd/extensions/sd-webui-controlnet/models/control_sd15_depth.pth]
ControlNet model control_sd15_depth [fef5e48e] loaded.
0%| | 0/16 [00:00<?, ?it/s]loc("mps_add"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/0aa643d0-625a-11ed-b319-a23c4f261b56/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":228:0)): error: input types 'tensor<2x1280xf32>' and 'tensor<*xf16>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
zsh: abort ./webui.sh
(base) philbuck@PhilsMacStudio sd % /opt/homebrew/Cellar/[email protected]/3.10.9/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

@jwooldridge234
Copy link

You can get it working if you use --no-half, but it's obviously a lot slower and uses a lot more memory. Hoping for a solution to this, the tool looks really cool.

@Philbuck84
Copy link

my Web UI launches with the argument: --no-half-vae, does that have a different effect than --no-half?

@jwooldridge234
Copy link

Yeah, I still get the tensor size mismatch with --no-half-vae, only --no-half fixes it. I'm not the most knowledgable on pytorch, but I believe that the issue is when you try to do an operation with a tensor of type float16 and a tensor of size float32. --no-half forces everything to use float32 and fixes the issue, but at a significant cost to performance.

@Philbuck84
Copy link

Hmm, --no-half unfortunately doesn't fix it for me. I get a whole different set of errors.

@jwooldridge234
Copy link

jwooldridge234 commented Feb 15, 2023

Interesting... mind posting them and your system specs?

@Philbuck84
Copy link

Sure, systems specs are Mac Studio M1 Ultra Running Ventura 13.1
Error is super long:

ControlNet model control_sd15_openpose [fef5e48e] loaded.
Error running process: /Users/philbuck/sd/extensions/sd-webui-controlnet/scripts/controlnet.py
Traceback (most recent call last):
File "/Users/philbuck/sd/modules/scripts.py", line 386, in process
script.process(p, *script_args)
File "/Users/philbuck/sd/extensions/sd-webui-controlnet/scripts/controlnet.py", line 270, in process
input_image = HWC3(image['image'])
TypeError: 'NoneType' object is not subscriptable

0%| | 0/20 [00:00<?, ?it/s]
Error completing request
Arguments: ('task(5q0lfqy3o0p0qe2)', 'Dog', '', [], 20, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 512, 768, False, 0.7, 2, 'Latent', 0, 0, 0, [], 0, False, 'keyword prompt', 'keyword1, keyword2', 'None', 'textual inversion first', True, 'openpose', 'control_sd15_openpose [fef5e48e]', 1, None, False, 'Scale to Fit (Inner Fit)', False, False, False, 3, 0, False, False, False, False, 'positive', 'comma', 0, False, False, '', 1, '', 0, '', 0, '', True, False, False, False, 0, None, True, None, None, False, 10.0, True, 30.0, True, 0.0, 'Lanczos', 1) {}
Traceback (most recent call last):
File "/Users/philbuck/sd/modules/call_queue.py", line 56, in f
res = list(func(*args, **kwargs))
File "/Users/philbuck/sd/modules/call_queue.py", line 37, in f
res = func(*args, **kwargs)
File "/Users/philbuck/sd/modules/txt2img.py", line 56, in txt2img
processed = process_images(p)
File "/Users/philbuck/sd/modules/processing.py", line 486, in process_images
res = process_images_inner(p)
File "/Users/philbuck/sd/modules/processing.py", line 628, in process_images_inner
samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts)
File "/Users/philbuck/sd/modules/processing.py", line 828, in sample
samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
File "/Users/philbuck/sd/modules/sd_samplers_kdiffusion.py", line 323, in sample
samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
File "/Users/philbuck/sd/modules/sd_samplers_kdiffusion.py", line 221, in launch_sampling
return func()
File "/Users/philbuck/sd/modules/sd_samplers_kdiffusion.py", line 323, in
samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
File "/opt/homebrew/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/Users/philbuck/sd/repositories/k-diffusion/k_diffusion/sampling.py", line 145, in sample_euler_ancestral
denoised = model(x, sigmas[i] * s_in, **extra_args)
File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/philbuck/sd/modules/sd_samplers_kdiffusion.py", line 116, in forward
x_out = self.inner_model(x_in, sigma_in, cond={"c_crossattn": [cond_in], "c_concat": [image_cond_in]})
File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/philbuck/sd/repositories/k-diffusion/k_diffusion/external.py", line 114, in forward
eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
File "/Users/philbuck/sd/repositories/k-diffusion/k_diffusion/external.py", line 140, in get_eps
return self.inner_model.apply_model(*args, **kwargs)
File "/Users/philbuck/sd/modules/sd_hijack_utils.py", line 17, in
setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
File "/Users/philbuck/sd/modules/sd_hijack_utils.py", line 28, in call
return self.__orig_func(*args, **kwargs)
File "/Users/philbuck/sd/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 858, in apply_model
x_recon = self.model(x_noisy, t, **cond)
File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/philbuck/sd/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 1329, in forward
out = self.diffusion_model(x, t, context=cc)
File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/philbuck/sd/extensions/sd-webui-controlnet/scripts/cldm.py", line 107, in forward2
return forward(*args, **kwargs)
File "/Users/philbuck/sd/extensions/sd-webui-controlnet/scripts/cldm.py", line 72, in forward
control = outer.control_model(x=x, hint=outer.hint_cond, timesteps=timesteps, context=context)
File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/philbuck/sd/extensions/sd-webui-controlnet/scripts/cldm.py", line 381, in forward
guided_hint = self.input_hint_block(hint, emb, context)
File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/philbuck/sd/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py", line 86, in forward
x = layer(x)
File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/philbuck/sd/extensions-builtin/Lora/lora.py", line 182, in lora_Conv2d_forward
return lora_forward(self, input, torch.nn.Conv2d_forward_before_lora(self, input))
File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 457, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/opt/homebrew/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 453, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
TypeError: conv2d() received an invalid combination of arguments - got (NoneType, Parameter, Parameter, tuple, tuple, tuple, int), but expected one of:

  • (Tensor input, Tensor weight, Tensor bias, tuple of ints stride, tuple of ints padding, tuple of ints dilation, int groups)
    didn't match because some of the arguments have invalid types: (NoneType, Parameter, Parameter, tuple, tuple, tuple, int)
  • (Tensor input, Tensor weight, Tensor bias, tuple of ints stride, str padding, tuple of ints dilation, int groups)
    didn't match because some of the arguments have invalid types: (NoneType, Parameter, Parameter, tuple, tuple, tuple, int)

@jwooldridge234
Copy link

Hmm. I wonder if we might be running different versions of pytorch. Are you using the mac-specific build discussed here? I recommend using it even though it doesn't resolve this issue, since it provides a ~25% speed boost on MPS.

@Philbuck84
Copy link

I was not running that mac-specific build. Thanks for tipping me to that resource. I'm installing now and will try ControlNet again to see if get the same error.

@Philbuck84
Copy link

Unfortunately, still running into the same errors even after updating to the mac-specific build. 🤷‍♂️

@jwooldridge234
Copy link

Hmm. When you launch, do you use webui.sh?

@Philbuck84
Copy link

Yes

@jwooldridge234
Copy link

And you've pulled the latest Automatic update & sd-webui-controlnet update, correct? Just trying to figure out what could be different in our setup.

@Philbuck84
Copy link

Philbuck84 commented Feb 15, 2023

Wow, ok I did need to update ControlNet and it's working with --no-half. Thank you!

@jwooldridge234
Copy link

jwooldridge234 commented Feb 15, 2023

No worries! Glad it helped. Hopefully we can get a solution that allows us to use float16

@jwooldridge234
Copy link

With the command line arg --opt-sub-quad-attention and --no-half it runs about twice as fast for me (7-8s/it vs 20s/it). Still terrible but a bit better.

@brkirch
Copy link
Contributor

brkirch commented Feb 15, 2023

Yes, float16 doesn’t work correctly with MPS on this extension yet. I will try to fix that but I can’t make any guarantees at this point. First though I want to fix the normal and depth map preprocessors returning bad/inconsistent results on MPS.

@jwooldridge234
Copy link

Sounds good. I'll take a look after work and see if I can debug.

@mylife4aiur5
Copy link

Screenshot 2023-02-15 at 11 40 34 PM
I have to assume that v21 is now required to use ControlNet. Whenever I use v15 models, it crashes.

@Mikubill Mikubill mentioned this issue Feb 16, 2023
@David-Gianotti
Copy link

I'm also running a Mac Studio M1 Ultra Running Ventura 13.1 and have got the latest versions of everything - Automatic1111, Controlnet, Python, et al. Have applied the suggested fixes that have worked for Philbuck84 as per above but Python still keeps crashing on trying to render using Controlnet every time, with the exact same error warning resulting in the message ending in "There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '"

@David-Gianotti
Copy link

I'm also running a Mac Studio M1 Ultra Running Ventura 13.1 and have got the latest versions of everything - Automatic1111, Controlnet, Python, et al. Have applied the suggested fixes that have worked for Philbuck84 as per above but Python still keeps crashing on trying to render using Controlnet every time, with the exact same error warning resulting in the message ending in "There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '"

I can confirm now that it is working for me after adding --no-half to the web-user.sh file, COMMANDLINE_ARGS line.

@enzyme69
Copy link
Author

For unknown reason, from today, I kept getting the crashing again. It was working find for many weeks and suddenlty....

Loading preprocessor: none
0%| | 0/20 00:00<?, ?it/s: /AppleInternal/Library/BuildRoots/9e200cfa-7d96-11ed-886f-a23c4f261b56/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm:39:0: error: 'mps.matmul' op contracting dimensions differ 1024 & 768
(mpsFileLoc): /AppleInternal/Library/BuildRoots/9e200cfa-7d96-11ed-886f-a23c4f261b56/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm:39:0: note: see current operation: %3 = "mps.matmul"(%arg0, %2) {transpose_lhs = false, transpose_rhs = false} : (tensor<1x77x1024xf32>, tensor<768x320xf32>) -> tensor<1x77x320xf32>
zsh: segmentation fault ./webui.sh
jimmygunawan@192-168-1-100 stable-diffusion-webui % /opt/homebrew/Cellar/[email protected]/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

@enzyme69
Copy link
Author

If I use the diff_openpose model, it's working. But the more recent one keeps crashing my webUI automatic1111.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants