-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BSOD when installing software #364
Comments
It's not clear from the minidump where in the stack we are, but it is in avx/ymm code so perhaps it is related. If you were to set the various |
You mean in registry? Sure, I can test that. EDIT: |
Yeah, just type in generic, and push return. You can hit F5 and it should say "cycle fastest [generic] sse2 sse41 avx2". Although, the top of the stack claims to be in refcounter.. I need to see if I can load the symbols from the release |
Ah ok
ok yeah, nothing to do with |
Could you increase the size of the memory.dmp? It would be useful to see which abd, and the refcount. |
Of course. |
Small update. I crashed the system via powershell to generate a memory dump, maybe this is related. I will still try to provoke a BSOD further, but maybe this memory dump can give pointers already? |
Few cents from me: (this can be my Windows configuration specific) I "removed" 75% of system services without a lot of thought for any dependencies of those - so my BSODs might be caused indirectly by the config ive got - that being said, I can BSOD pretty fast with multithread / parallel heavy I/O software. Those seems to be a monkey wrench for our Windows driver gearbox How to replicate:
Booom, BSOD TO CLARIFY it might be my ignorance for proper (win) system configuration and NOT the driver. I personally use @lundman awesome driver since 4+ years without any data loss, swapping daily between 3 os's (also at work) - but I also needed to learn zfs the hard way (which included data loss, but this was a configuration result and the result of not properly rtfm) @lundman: Let me know if you need dumps from W10/W11 - can package some for you. Thanks for the great work! |
Yeah, I would love stacks from any BSOD, so i can fix them |
Win OpenZFS driver and WinBtrfs driver dont play well with each other (I uninstalled Winbtrfs after it caused some wierd errors on subvols) if you got those 2 mixed with OpenZFS driver, you can check how it behaves if those are not running |
really? huh - I'll try installing both |
Hey there! Sorry for the long silence, I have been testing on and off but now I finally have a reproducible BSOD test case. Steps to reproduce on my system: Create a dataset with Dedup and compression=zstd-19 with blocksize 1M as well as xattr=sa So I will provide you with the python script I used to create the dummy 32GB text file as well as the file I used, compressed. Python script to generate a 32GB text file (took me a good 5h on my system, needs the pip module "faker"): Here's a link to the 32GB text file crunched down to 10.9GB: |
I also got the same error code (SYSTEM_THREAD_EXCEPTION_NOT_HANDLED BSOD) on 2.2.3rc3. May or may not be related. I run FreeFileSync in the background and use a zfs pool to keep a few folders in sync between my linux and windows dual boot, so I imagine it was doing something at the time. I hope this helps:
|
Thanks, I can probably lookup that symbol. But since it's coming off IopProcessWorkItem - which we only use in one place, I think I know where to start looking |
Oh, I see
Specifically
|
Taking out the assert (but I don't think that is the right answer) if you want to try |
OK that was tricky. There is indeed an unload BSOD problem, fixed in 01aa832 |
Hello everyone, hello @lundman , Again sorry for my long absence, but I have now been able to reproduce my original BSOD with a full memory dump. SYSTEM_THREAD_EXCEPTION_NOT_HANDLED Full Memory Dump (9.2GB) : |
OK sorry for the delay, had to find a way to fit the memory.dmp on my VM :) This is the cause:
number here is 0, but The cbuf log
Which is interesting, I don't think I have ever triggered the slow-IO path, so that is the best bet to look at what goes wrong. 2.2.3rc2-dirty is a bit old, can you go to the latest when convenient. |
Sure! |
Don't have to chase this particular bug, the memory.dmp provided showed it isn't something we have fixed. One is curious if |
Indeed, I have set zfs_deadman_failmode=continue. Now it seems with continue it simply BSODs the system instead of making the whole filesystem unresponsive. |
Yeah that is interesting, so I think we are looking at
You are using just datasets or zvols? |
I am working directly on a zvol, no datasets created |
Some more observations: |
yeah so with higher timeout, or disabled deadman, it might not die at all? It's in the Registry |
I have done some preliminary testing with deadman timeout set to 5 minutes instead of 1 minute and wait instead of continue. |
OK that is interesting - I do think we should fix the BSOD when it tries to restart IO - or lob it over to upstream and run away. |
@lundman do you have a very recent dirty build somehwere around? the devvm from ms i used evaluated itself into oblivion... at so need to reconfigure. thanks to ms ^.^ its faster to download https://developer.microsoft.com/en-us/windows/downloads/virtual-machines and uhmm "refubrish" then install vs manually ;> |
If it's the Win downloaded VM images, you can run |
I am happy to announce that with deadman timeouts set to an unreasonably high value, my ZFS works like a charm! If you think it's applicable, you can close this issue if you'd consider this as resolved. |
System information
Describe the problem you're observing
BSOD while installing software on the ZFS volume.
Had this problem with RC1, continues to persist in RC2.
System was not stalled before the BSOD, I could observe kernel CPU util in taskmanager at around 30-40% (~10 threads active) so work was being done, in line with highly compressible data being written to disk.
BSOD is sporadic, cannot reproduce on demand. 3rd BSOD thus far, sometimes I can install data for hours on end.
I appended the minidump.
031824-16593-01.dmp
SYSTEM_THREAD_EXCEPTION_NOT_HANDLED
Caused by OpenZFS.sys+54de92
Crash Address OpenZFS.sys+1807e0
Describe how to reproduce the problem
The text was updated successfully, but these errors were encountered: