-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integer overflow in dmu.c #8778
Labels
Type: Defect
Incorrect behavior (e.g. crash, hang)
Comments
@mazouffre your analysis looks correct to me. Thank you very much for taking the time to get to the root cause of the issue. Would you mind opening a PR which adds the needed cast. |
12 tasks
behlendorf
pushed a commit
that referenced
this issue
May 29, 2019
dn->dn_datablksz type is uint32_t and need to be casted to uint64_t to avoid an overflow when the record size is greater than 4 MiB. Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olivier Mazouffre <[email protected]> Closes #8778 Closes #8797
pvdabeel
pushed a commit
to pvdabeel/gentoo
that referenced
this issue
May 29, 2019
Issue: openzfs/zfs#8816 Issue: openzfs/zfs#8778 Bug: https://bugs.gentoo.org/635002 Package-Manager: Portage-2.3.67, Repoman-2.3.12 Signed-off-by: Georgy Yakovlev <[email protected]>
behlendorf
pushed a commit
that referenced
this issue
Jun 7, 2019
dn->dn_datablksz type is uint32_t and need to be casted to uint64_t to avoid an overflow when the record size is greater than 4 MiB. Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olivier Mazouffre <[email protected]> Closes #8778 Closes #8797
allanjude
pushed a commit
to allanjude/zfs
that referenced
this issue
Jun 7, 2019
dn->dn_datablksz type is uint32_t and need to be casted to uint64_t to avoid an overflow when the record size is greater than 4 MiB. Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olivier Mazouffre <[email protected]> Closes openzfs#8778 Closes openzfs#8797
allanjude
pushed a commit
to allanjude/zfs
that referenced
this issue
Jun 15, 2019
dn->dn_datablksz type is uint32_t and need to be casted to uint64_t to avoid an overflow when the record size is greater than 4 MiB. Reviewed-by: Tom Caputi <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Olivier Mazouffre <[email protected]> Closes openzfs#8778 Closes openzfs#8797
gentoo-repo-qa-bot
pushed a commit
to gentoo-mirror/linux-be
that referenced
this issue
Oct 6, 2019
Issue: openzfs/zfs#8816 Issue: openzfs/zfs#8778 Bug: https://bugs.gentoo.org/635002 Package-Manager: Portage-2.3.67, Repoman-2.3.12 Signed-off-by: Georgy Yakovlev <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
System information
Describe the problem you're observing
I have encountered a kernel panic when importing a specific pool (divide by 0 exception). The panic was triggered by the execution of one entry in the pool delete queue. I have managed to import the pool with the parameter zfs_unlink_suspend_progress set. The zfs status command reported that the pool was healthy, no error was detected by a scrub.
After digging, the panic seems to come from an integer overflow on this line: https://github.com/zfsonlinux/zfs/blob/master/module/zfs/dmu.c#L723.
The arithmetic is done in unsigned 32 bit, so with dn->dn_indblkshift that is 17 and SPA_BLKPTRSHIFT that is 7, the maximum value allowed for dn->dn_datablksz is 4 MiB - 1.
But the record size of the pool is 4 MiB, so the overflow and the panic. The manual of the zfs kernel module states that up to 16MiB record size is supported.
The kernel log with few debug prink added to the code of the get_next_chunk function of dmu.c:
In my case, because of the overflow and the value of the other variables, there are several 0/0 operations on this line: https://github.com/zfsonlinux/zfs/blob/master/module/zfs/dmu.c#L733. So, depending on the code generated by the compiler, a divide by 0 exception is triggered or not. I have encountered the two cases. I have used a dump (zdb -bcc -x) of the pool to reproduce the bug at will.
After addition of a cast operator:
the bug seems gone:
I can import the pool without error.
Include any warning/errors/backtraces from the system logs
The kernel log of the panic:
The text was updated successfully, but these errors were encountered: