-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPLError: 5906:0:(zfs_znode.c:330:zfs_inode_set_ops()) SPL PANIC #709
Comments
Sorry. I gave some bad information then...after more testing The problem is if I zfs umount -a It needs to be rebooted before it will mount again. This is reproducable every time. |
Argh, I really need to make a new tag. This was fixed post-rc8 so just grab the latest source from master or if your running Ubuntu the dialy PPA will have the fix. |
Hi Brian,
|
That's quite a bit slower than I'd expect, you files don't happen to have a lot of xattrs do they? |
Hi Brian, Funny you should say that. They all have xattrs...thats how I am storing windows acls. Any way of speeding it up? Thanks. -----Original Message----- From: Brian Behlendorf That's quite a bit slower than I'd expect, you files don't happen to have a lot of xattrs do they? Reply to this email directly or view it on GitHub: |
Xattrs are known to be quite slow with zfs because of how they stored on disk. This is basically true for all the various implementations, it's not specific to Linux. However, your in luck because Lustre makes extensive use of xattrs we've done some work to optimize their performance in the Linux zfs port. If you set the xattr property on your dataset to
So you may want to enable the feature on a new dataset then rsync the files in to it. Then do a little testing and see how much improved the traversal time is. I'd be very interested in your before and after numbers, zfs set xattr=sa tank/fish |
Thanks, Brian. I'll recreate the filesystem and let you know the results...great product by the way. -----Original Message----- From: Brian Behlendorf Xattrs are known to be quite slow with zfs because of how they stored on disk. This is basically true for all the various implementations, it's not specific to Linux. However, your in luck because Lustre makes extensive use of xattrs we've done some work to optimize their performance in the Linux zfs port. If you set the xattr property on your dataset to
So you may want to enable the feature on a new dataset then rsync the files in to it. Then do a little testing and see how much improved the traversal time is. I'd be very interested in your before and after numbers, zfs set xattr=sa tank/fish Reply to this email directly or view it on GitHub: |
Hi Brian, All was going really well...until. I had loaded up 1.4TB of 1.7 million files and then it stopped with: May 3 20:52:36 bk568 kernel: ZFS: Invalid mode: 0xd276 Spl.log has this at the end: Any ideas? Thanks. Mark -----Original Message----- From: Brian Behlendorf Xattrs are known to be quite slow with zfs because of how they stored on disk. This is basically true for all the various implementations, it's not specific to Linux. However, your in luck because Lustre makes extensive use of xattrs we've done some work to optimize their performance in the Linux zfs port. If you set the xattr property on your dataset to
So you may want to enable the feature on a new dataset then rsync the files in to it. Then do a little testing and see how much improved the traversal time is. I'd be very interested in your before and after numbers, zfs set xattr=sa tank/fish Reply to this email directly or view it on GitHub: |
Hi Brian, The pool cannot be read from now. I had to pull the power out as it was stuck going down, unmounting filesystems. When it comes back up the pool is there, but: [root@bk568 D]# ls Message from syslogd@bk568 at May 3 22:06:09 ... Message from syslogd@bk568 at May 3 22:06:09 ... Is this fixable? Thanks. -----Original Message----- From: Brian Behlendorf Xattrs are known to be quite slow with zfs because of how they stored on disk. This is basically true for all the various implementations, it's not specific to Linux. However, your in luck because Lustre makes extensive use of xattrs we've done some work to optimize their performance in the Linux zfs port. If you set the xattr property on your dataset to
So you may want to enable the feature on a new dataset then rsync the files in to it. Then do a little testing and see how much improved the traversal time is. I'd be very interested in your before and after numbers, zfs set xattr=sa tank/fish Reply to this email directly or view it on GitHub: |
have you tried to load the zfs module with the paramter zfs_recover=1 then import and scrub the pool? |
I'll give that a go, but do you know what has caused this and if it will repeatedly happen? Would it be setting xattr=sa? I have found the pool loads and other filesystems mount and can be read, it is just one filesystem that causes the kernel panic. Thanks for your help and for the work you do on zfs. I would really like to use this in production rather than btrfs with only compression. Zfs dedup will save me loads of space. Any logs or help you need, just ask. -----Original Message----- From: Massimo Maggi have you tried to load the zfs module with the paramter zfs_recover=1 then import and scrub the pool? Reply to this email directly or view it on GitHub: |
The initial failure is interesting, basically you have triggered an assertion in the code because you tried to create a file with the mode set to (S_IFREG | S_IFDIR | S_IFIFO). That's just nonsense and I'm surprised the VFS allowed it. The assertion is there because I was fairly sure the VFS wouldn't permit this. However, since it does pass it through we'll have to do something with it... either return an errno or silently set the modes but to something reasonable. For now you'll want to locate this file (and others like it) in your source filesystem and inspect it for other damage. Then set its permissions to something sane which will let you avoid this issue in the next rsync until we figure out the right thing to do with this. Next you'll want to set the |
Thanks, Brian. Is there a 'find' command i can use to locate the files/directories that have this set? I cannot get into the filesystem to find them anyway, so what steps do I need to go through to recover this? From what I understand from you're email, I'll need to export it and then import with the zil_replay_disable option. By the way where do I set that option? Many thanks. Mark -----Original Message----- From: Brian Behlendorf The initial failure is interesting, basically you have triggered an assertion in the code because you tried to create a file with the mode set to (S_IFREG | S_IFDIR | S_IFIFO). That's just nonsense and I'm surprised the VFS allowed it. The assertion is there because I was fairly sure the VFS wouldn't permit this. However, since it does pass it through we'll have to do something with it... either return an errno or silently set the modes but to something reasonable. For now you'll want to locate this file (and others like it) in your source filesystem and inspect it for other damage. Then set its permissions to something sane which will let you avoid this issue in the next rsync until we figure out the right thing to do with this. Next you'll want to set the Reply to this email directly or view it on GitHub: |
By the way...the files were coming from rsync on an xfs system centos 6.2, through ssh to rsync on zfs centos 6.2 -----Original Message----- From: Brian Behlendorf The initial failure is interesting, basically you have triggered an assertion in the code because you tried to create a file with the mode set to (S_IFREG | S_IFDIR | S_IFIFO). That's just nonsense and I'm surprised the VFS allowed it. The assertion is there because I was fairly sure the VFS wouldn't permit this. However, since it does pass it through we'll have to do something with it... either return an errno or silently set the modes but to something reasonable. For now you'll want to locate this file (and others like it) in your source filesystem and inspect it for other damage. Then set its permissions to something sane which will let you avoid this issue in the next rsync until we figure out the right thing to do with this. Next you'll want to set the Reply to this email directly or view it on GitHub: |
Hi Brian, Sorry - i set options zfs zil_replay_disable=1 in /etc/modprobe/zfs.conf, exported the pool, imported the pool Any ideas? Thanks. -----Original Message----- From: Brian Behlendorf The initial failure is interesting, basically you have triggered an assertion in the code because you tried to create a file with the mode set to (S_IFREG | S_IFDIR | S_IFIFO). That's just nonsense and I'm surprised the VFS allowed it. The assertion is there because I was fairly sure the VFS wouldn't permit this. However, since it does pass it through we'll have to do something with it... either return an errno or silently set the modes but to something reasonable. For now you'll want to locate this file (and others like it) in your source filesystem and inspect it for other damage. Then set its permissions to something sane which will let you avoid this issue in the next rsync until we figure out the right thing to do with this. Next you'll want to set the Reply to this email directly or view it on GitHub: |
You should also "rmmod zfs" after exporting the pool and reinsert it - or reboot. |
I'd start by running a Alternately, you could try this quick patch behlendorf/zfs@0a6b2b4 should should convert the assertion in to just an error code back to rsync. To use the |
Hi, I have set the zil_replay_disable=1 option, exported, rebooted, imported. ZFS: Invalid mode: 0xd276 If instead i CD into a subdirectory of that filesystem, it works fine ,but i cannot ls the base of the filesystem. I tried a rm * to delete anything but that cause a kernel panic too. Any ideas? I haven't applied the patch as I didn't think it would fix an already broken filesystem. I left a zpool scrub working overnight and it said there were no errors. Thanks. -----Original Message----- From: Brian Behlendorf I'd start by running a Alternately, you could try this quick patch behlendorf/zfs@0a6b2b4 should should convert the assertion in to just an error code back to rsync. To use the Reply to this email directly or view it on GitHub: |
Some more information: If i cd into the filesystem and do: That shows everything, but 'ls' crashes the kernel. -----Original Message----- From: Brian Behlendorf I'd start by running a Alternately, you could try this quick patch behlendorf/zfs@0a6b2b4 should should convert the assertion in to just an error code back to rsync. To use the Reply to this email directly or view it on GitHub: |
I edited zfs_znode.c to add your changes in, butrc = zfs_inode_set_ops(zsb, ip); does not compile since rc is not defined.so i changed to int rc = zfs_inode_set_ops(zsb, ip); this compiled but complained about declarations being different or something.
|
CC [M] /tmp/zfs-build-root-6bPEtzO1/BUILD/zfs-0.6.0/module/zfs/../../module/zfs/zfs_znode.o/tmp/zfs-build-root-6bPEtzO1/BUILD/zfs-0.6.0/module/zfs/../../module/zfs/zfs_znode.c: In function 'zfs_znode_alloc':/tmp/zfs-build-root-6bPEtzO1/BUILD/zfs-0.6.0/module/zfs/../../module/zfs/zfs_znode.c:401: error: 'rc' undeclared (first use in this function)/tmp/zfs-build-root-6bPEtzO1/BUILD/zfs-0.6.0/module/zfs/../../module/zfs/zfs_znode.c:401: error: (Each undeclared identifier is reported only once/tmp/zfs-build-root-6bPEtzO1/BUILD/zfs-0.6.0/module/zfs/../../module/zfs/zfs_znode.c:401: error: for each function it appears in.)
|
I managed to get the filesystem to read. The rsync with extended attributes is still very slow, i think slower than before. But it looks like it is cached because if i kill the rsync it resumes very quickly...however at 100 files per second it is just too slow. -----Original Message----- From: Brian Behlendorf I'd start by running a Alternately, you could try this quick patch behlendorf/zfs@0a6b2b4 should should convert the assertion in to just an error code back to rsync. To use the Reply to this email directly or view it on GitHub: |
Hi Brian,
|
@MarkRidley123 Glad to hear you were able to get the pool readable. As for your performance observations I actually don't think that's too surprising. The way the SA xattrs work is that very small xattrs (<100 bytes) will be saved in the 512 byte dnode on disk. That means reading them is basically free since you've already paid the cost to read the dnode in to memory. For slightly larger xattrs in the 100b-64kb range they will be stored in what zfs calls a spill block and that will need to be read off disk resulting in an I/O operation, For xattrs larger than 64k they will get stored in the legacy xattr directory format and may be of arbitrary size but will be ever slower due to an additional disk I/O the lookup process. So for your case 100/s may seem slow but I suspect that's largely being limited by your disk IOPs. You can check this by watching the output of |
@behlendorf Re my experience in openzfs/spl#112, there are definitely no bad modes in the source fs, and the "Invalid mode" message appeared at different times in different directories. It seems the underlying problem may be due to ZFS corrupting something on the way to the disk, rather than VFS simply allowing bad modes to be set/retrieved? Of course ZFS itself should still be able to cope with bad modes as you've indicated. |
Here we go, an easy reproducer, using either zpool export/import or a snapshot. Export/import:
Snapshot (after rebooting from the export/import test):
|
And further... no xattr on dir, xattr on file => no fail OK, given it's now 6:30 Sunday evening, I think I'm done! |
Thanks for the simple test case, that should make it easier to run to ground. It also sounds like we're somehow properly creating the file/dir and xattrs in memory and then not flushing them to disk. The would account for while they remain good as long as they are in the cache. However, once try are dropped from the cache (import/export, or accessed via a snapshot) and need to be refetched from disk with zfs_zget() and we notice the damage. I'm assuming you can't reproduce this with |
Correct, can't reproduce with |
Confirmed still happening with openzfs/spl@e0093fe, 74497b7 on linux-3.3.8:
And extracted from kern.log;
|
It was determined this was caused by using a private SA handle. However, we still haven't produced an updated patch to address this. |
This reverts commit ec2626a which caused consistency problems between the shared and private handles. Reverting this change should resolve issues openzfs#709 and openzfs#727. It will also reintroduce an arc_anon memory leak which is addressed by the next commit. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#709 Issue openzfs#727
The bug causing corruption of the mode bits may still exist in rc14. I apologize that I do not have the exact log messages as the SPL PANIC message was not logged to disk, the system was unresponsive and the message scrolled off the console before I could copy it by hand. However, this pool was created using rc14 and xattr=sa to begin with and later changed to xattr=dir. I was using rsync to copy data over the network from another pool attached to another system when the SPL PANIC occurred. The messages did include an error about an invalid mode. I have since destroyed the pool and recreated it, again with rc14, but with xattr left at the default. If I experience any further issues, I will post back here. Again, I apologize that I do not have the exact messages. |
@dringdahl Please keep us updated but this issue should be very much dead. |
@behlendorf I am only about a day into the copy this go around and have had xattr=on from the beginning on the destination pool. If it would help, I am willing to start over with xattr=sa assuming you can provide a patch to have the error returned to rsync rather than zfs locking up. Please note that both source and destination pools have the following set: Please let me know what you would like me to do. |
@dringdahl Unfortunately, the reason this is fatal in the existing code is because there is currently no safe way to handle it. However, you can try the following patch in your local tree which will instead of panic'ing just treat the inode as a regular file. diff --git a/module/zfs/zfs_znode.c b/module/zfs/zfs_znode.c
index 9bf26a7..2b78501 100644
--- a/module/zfs/zfs_znode.c
+++ b/module/zfs/zfs_znode.c
@@ -335,8 +335,15 @@ zfs_inode_set_ops(zfs_sb_t *zsb, struct inode *ip)
break;
default:
- printk("ZFS: Invalid mode: 0x%x\n", ip->i_mode);
- VERIFY(0);
+ /*
+ * FIXME: Unknown type but treat it as a regular file.
+ */
+ printk("ZFS: Invalid mode: 0x%x assuming file\n", ip->i_mode);
+ ip->i_mode |= S_IFREG;
+ ip->i_op = &zpl_inode_operations;
+ ip->i_fop = &zpl_file_operations;
+ ip->i_mapping->a_ops = &zpl_address_space_operations;
+ break;
}
}
|
@behlendorf I would try to modify the code to do so but I am not familiar enough with it to be certain I would not introduce a memory leak or some other problem. I'm also happy to have additional verbose debugging to help isolate where the issue is. |
@dringdahl That would be my preference as well, but unfortunately this particular call path was never written to handle an error at this point. That's a long standing bit of technical debt we inherited whichs need to address at some point. But that means that today there's no particularly easy way to handle this in a non-fatal way. |
@behlendorf |
I tried skimming through the issue but I am unsure if there is a known solution or workaround. I am seeing this error when the pool experiences heavy activity. My setup is Fedora 20 running kernel 3.12.5 and zfs/spl 0.6.2 (git as of 20131221). My xattr setting is "on" and not "sa". Is there anything I can do to mitigate the crash?
|
I'm happy with my current performance, I'd just like to workaround the kernel oops. Would xattr=sa also do that? From what I read above, my understanding is that xattr=sa triggers the bug more often... |
@firewing1 Brian's patch above should let you avoid the ASSERT, but I'd enhance the patch to include the inode number to make it easier to find the affected file[s], e.g.:
With that patch installed you'd then try to find all the affected files, e.g. You say you see the problem under heavy activity. Can you recreate it without the heavy activity, and is it consistently on specific files? E.g. with nothing else running on the filesystem, do a The original bug was caused by xattr=sa. It doesn't seem likely that setting xattr=sa will help avoid the problem. Is it possible this filesystem ever had xattr=sa turned on between the original problem being introduced and the fix being applied? The original problem was committed 2012-03-02 and the fix was committed 2012-08-23, but of course it depends on when you might have picked up the code. |
@chrisrd thanks for the patch - I've applied it and listed all files on the filesystem, a few have shown and fortunately I can easily replace them. I will try and do some more testing to see if I can trigger the issue reliably, but so far it hasn't happened during regular use (mainly serving up media files via Plex and acting as a user /home). I encountered the errors while copying several GB of data and (in parallel) issuing a This is a brand new pool as of mid-December, so no chance of being bitten by the old code. |
@firewing1 You may be right about the chown: see #1978. ...except that only affects xattr=sa file systems and it sounds like you're not using that - can you confirm this either way? If the fs has ever had xattr=sa set you may have run into the same issue, especially with your suspicions about chown. Does your code include @5d862cb0d9a4b6dcc97a88fa0d5a7a717566e5ab? If not, that's very likely the problem. However you've said you're running git as of 20131221 and that commit is dated 20131219, And, if the fs has never had xattr=sa then it seems we're looking at a new problem :-( |
The zettacache is supposed to limit the number of pending changes to a percentage of system memory. However, the logic is reversed, so typically all PendingChanges::update()'s are dropped if there isn't already a PendingChange. This applies to accesses of blocks that are already in the cache, where we should create an AtimeUpdate, but instead we ignore it. This commit addresses the issue by using the helper method, which has the correct logic. Caused-by: openzfs#479
Hi,
I am testing rc8 inside ESX 5 on some fast Dell hardware. I've given it 15GB of memory.
I then rebooted the server and now the pool will not mount. This is in the log:
What do i need to do? Thanks.
[ 957.090546] BUG: unable to handle kernel NULL pointer dereference at 0000000000000490
[ 957.090925] IP: [] zfs_preumount+0x10/0x30 [zfs]
[ 957.091220] PGD 3ba908067 PUD 3aa869067 PMD 0
[ 957.091543] Oops: 0000 [#11] SMP
[ 957.091805] CPU 1
[ 957.091892] Modules linked in: lockd ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables zfs(PO) zcommon(PO) znvpair(PO) zavl(PO) zunicode(PO) spl(O) shpchp ppdev parport_pc parport i2c_piix4 i2c_core vmw_balloon vmxnet3 microcode sunrpc btrfs zlib_deflate libcrc32c vmw_pvscsi [last unloaded: scsi_wait_scan]
[ 957.094416]
[ 957.094543] Pid: 2097, comm: mount.zfs Tainted: P D O 3.3.2-1.fc16.x86_64 #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
[ 957.095116] RIP: 0010:[] [] zfs_preumount+0x10/0x30 [zfs]
[ 957.095528] RSP: 0018:ffff8803ba23dd08 EFLAGS: 00010296
[ 957.095728] RAX: 0000000000000000 RBX: ffff8803e9624800 RCX: 000000000000fba2
[ 957.095963] RDX: 000000000000fba1 RSI: ffff8803e8207000 RDI: 0000000000000000
[ 957.096198] RBP: ffff8803ba23dd08 R08: 00000000000165f0 R09: ffffea000fa08000
[ 957.096434] R10: ffffffffa016499b R11: 0000000000000000 R12: ffffffffa02cd4c0
[ 957.096670] R13: ffff8803ba23dd88 R14: fffffffffffffff0 R15: 0000000002000000
[ 957.096921] FS: 00007f3c0c238c00(0000) GS:ffff8803ffc80000(0000) knlGS:0000000000000000
[ 957.097228] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 957.097437] CR2: 0000000000000490 CR3: 00000003b29dd000 CR4: 00000000000006e0
[ 957.097676] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 957.097916] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 957.098152] Process mount.zfs (pid: 2097, threadinfo ffff8803ba23c000, task ffff8803e8bf2e60)
[ 957.098461] Stack:
[ 957.098600] ffff8803ba23dd28 ffffffffa02a37e6 00000a00ba23ddb8 ffff8803e9624800
[ 957.099089] ffff8803ba23dd48 ffffffff8118389c 00000000fffffff0 ffffffffa02a3830
[ 957.099578] ffff8803ba23dd78 ffffffff8118409b ffff8803ab732a00 ffffffffa02cd4c0
[ 957.100073] Call Trace:
[ 957.100258] [] zpl_kill_sb+0x16/0x30 [zfs]
[ 957.100476] [] deactivate_locked_super+0x3c/0xa0
[ 957.100733] [] ? zpl_mount+0x30/0x30 [zfs]
[ 957.100944] [] mount_nodev+0xab/0xb0
[ 957.101181] [] zpl_mount+0x25/0x30 [zfs][ 957.101388] [] mount_fs+0x43/0x1b0
[ 957.101586] [] vfs_kern_mount+0x6a/0xf0
[ 957.101791] [] do_kern_mount+0x54/0x110
[ 957.101996] [] do_mount+0x1a4/0x830
[ 957.102198] [] ? copy_mount_options+0x3a/0x170
[ 957.102417] [] sys_mount+0x90/0xe0
[ 957.102616] [] system_call_fastpath+0x16/0x1b
[ 957.102836] Code: 05 00 00 48 c7 c6 eb f1 2a a0 48 c7 c7 c0 ff 2b a0 e8 b5 9e ed ff e9 b8 fe ff ff 55 48 89 e5 66 66 66 66 90 48 8b bf f0 02 00 00 <48> 83 bf 90 04 00 00 00 74 05 e8 f1 cc fe ff 5d c3 66 66 66 66
[ 957.106671] RIP [] zfs_preumount+0x10/0x30 [zfs]
[ 957.106983] RSP
[ 957.107150] CR2: 0000000000000490
The text was updated successfully, but these errors were encountered: