-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SATA trim for vdev members #598
Comments
I believe there was some work for this going on at Illumos. We should check and see what they have done, with luck we'll just be able to port their changes with some modifications for Linux. |
What's the story on this? I badly need to TRIM my underlying device here. |
Should be easy for L2ARC devices (just discard the whole device on add/import) and SLOG devices (discard the log when txg is synced). Not so easy for actual data devices - as I remember from some discussion on the original ZFS mailing list a few years ago, there was serious issues involved. Most importantly, immediately discarding unused blocks could make zpool disaster recovery (i.e. import -F) impossible. |
What about batched discard? ZFS can know which areas of the disks are not used by a process of elimination, so it should at least implement what the fstrim command needs to do its job. |
It should be possible to implement using the free space map. The only problem is that no one has time to do it. |
One of the FreeBSD has developed this: http://people.freebsd.org/~pjd/patches/zfstrim.patch He told me that it is meant for single SSD pools. We only need to import it to gain TRIM support. |
This patch seems to indeed implement the desired feature and it even delays discards by some TXGs to avoid breaking So, is there any discussion on this patch somewhere? Code review? Test results? Googling "zfstrim.patch" doesn't seem to yield anything. Quite frankly, merging this patch without extensive code review and testing would be pure madness IMO. Also, this patch doesn't discard whole cache and log devices on export/import, although that's easy to implement separately (and much less dangerous). |
@dechamps It was a funded project in FreeBSD. pjd told me that the people who funded it have put it into production use. You can email him @freebsd.org with questions. By the way, do you intend to try porting this? It is on my to do list, although you probably would get to it sooner than I will. |
Mail sent.
I'm not quite sure yet, it depends on what my work load will be in the coming week. Anyway, as I understand it, the patch is very easy to port: it seems the only thing to do is to replace FreeBSD's BIO_DELETE operation by Linux's BLKDISCARD. |
Here's pjd's answers to my questions:
In light of this new information, I think we can proceed with the merge, but maybe we should disable the feature by default ( |
I think we should proceed very very very carefully here. We're talking about touching some delicate code with a fairly intrusive change. I know this would be a great feature, but it would be easy to get this subtly wrong. Why don't we start by porting the FreeBSD patch to a ZoL branch. Then those who are interested in this feature can get some serious testing on the code before we merge it in. I'd rather not risk destabilizing master until we are more confident about this change. |
I think that is very reasonable. On Wed, 2012-07-18 at 12:55 -0700, Brian Behlendorf wrote:
|
I agree. I'm still not sure I'll have the time to come up with the branch though, so if someone wants to jump in, go ahead. Like I said, this should by fairly easy to port. |
I am in agreement with everyone else. My time is being spent on other things though. If someone does the initial port, I am willing to field test it on my desktop. The worst case is that I will have to restore it from a backup. |
My laptop is willing to test the patch. I regularly do backup. |
I can probably test this too. I, too, have ZFS-over-LUKS on an SSD, On Thu, 2012-07-19 at 12:02 -0700, Massimo Maggi wrote:
|
Here it is: #924. It definitely needs extensive testing, so if you're not afraid to corrupt your pool, go ahead. |
Status on this? |
Largely done and working, but it needs to be rebasesd on master and seriously tested. I definitely do want to get it merged before it gets much staler. |
Maybe is time to rebase it? Perhaps a beta/experimental tarball version can be released to engage more people on testing the feature. |
I figure I'm far enough along to mention I'm working on porting Nextenta's TRIM support. It uses a rather different approach than the the one described earlier in this issue. My plan is to try to get disk and file vdevs working first and then deal with zvols. It's a good ways off yet but so far is looking good. I'll post more information to this particular issue as it becomes available. |
This makes no sense. Surely you mean initiating TRIM/UNMAP for vdevs?
|
Let me clarify the part regarding zvols: I want to evaluate whether to try replacing the existing zvol discard code with that from Nexenta. The particular use case for which I was asked to do this work is for file vdevs so I figured on porting the code and testing with file and disk vdevs first and then look at the zvol situation later. |
This appears to be coming together rather nicely and, so far, vdev file trimming (via hole-punching) appears to be working properly and, as expected, the existing zvol code works just fine (tested with a "discard"-mounted ext4 atop a zvol). I've not dealt with vdev disks yet but am instead running a bunch of stress tests with vdev file-based pools to make sure nothing has broken. I'll also be porting Nexenta's on-demand trim feature. The WIP spl code is in https://github.com/dweeezil/spl/tree/ntrim and the WIP zfs code is in https://github.com/dweeezil/zfs/tree/ntrim. |
sigh, so many ZFS ports, so little time :-) FWIW, Solaris 11 has had vdev-level UNMAP/TRIM for years. By default it is disabled.
|
Out of curiosity will be there available an fstrim like (offline) solution? |
The patch set seems to be working at this point. Please see #3656. @tomposmiko With the on-demand trim feature, |
@tomposmiko I had forgotten how weird FITRIM is. Among other things, it targets a single filesystem which in ZFS really isn't possible since space allocation is a pool attribute. I think |
@dweeezil theoretically you could perform reverse lookup from filesystem to pool(s) and the perform "zpool trim" on a result ... |
Would this work on ZFS over LUKs? We would need to enable the discard for the LUKs of course, but what else would be needed? |
@CMCDragonkai I currently run ZFS on LUKS and I just had to open the volume with |
Hi. Can anyone provide some instructions, how to run TRIM version of ZFS in Gentoo? I urgently need it, my SSD is failing. Is it possible to add patches to portage or modify ebuild file? |
I don't believe TRIM will help fix a failing SSD. |
If your SSD is failing, as @drescherjm said, TRIM won't help. What you want to do, and urgently is:
Hopefully you are using RAID and backups. Since this Github issue is for the ZFS TRIM feature and not for support, if you need any volunteer assistance with this, talk to me on Twitter at @aw_nz (https://twitter.com/aw_nz) |
Thanks for answers. I understood you, but anyway, I have heavy DB running on it, I don't want to trash newly bought SSD again. Of course I have backups/snaphots. I just want to enable/test TRIM on ZFS I think these instructions will help another testers/devs to test trim feature, so it can be on-topic here. Thank you very much. |
@behlendorf, @dweeezil; close (duplicate) because it's been superseded by #5925 ? |
Is pull request #8419 the final one that will actually fix this issue? (I don't understand why this issue was closed last year) |
@mailinglists35 yes, #8419 is the final version. This is was closed as a duplicate of a previous version of TRIM. |
UNMAP/TRIM support is a frequently-requested feature to help prevent performance from degrading on SSDs and on various other SAN-like storage back-ends. By issuing UNMAP/TRIM commands for sectors which are no longer allocated the underlying device can often more efficiently manage itself. This TRIM implementation is modeled on the `zpool initialize` feature which writes a pattern to all unallocated space in the pool. The new `zpool trim` command uses the same vdev_xlate() code to calculate what sectors are unallocated, the same per- vdev TRIM thread model and locking, and the same basic CLI for a consistent user experience. The core difference is that instead of writing a pattern it will issue UNMAP/TRIM commands for those extents. The zio pipeline was updated to accommodate this by adding a new ZIO_TYPE_TRIM type and associated spa taskq. This new type makes is straight forward to add the platform specific TRIM/UNMAP calls to vdev_disk.c and vdev_file.c. These new ZIO_TYPE_TRIM zios are handled largely the same way as ZIO_TYPE_READs or ZIO_TYPE_WRITEs. This makes it possible to largely avoid changing the pipieline, one exception is that TRIM zio's may exceed the 16M block size limit since they contain no data. In addition to the manual `zpool trim` command, a background automatic TRIM was added and is controlled by the 'autotrim' property. It relies on the exact same infrastructure as the manual TRIM. However, instead of relying on the extents in a metaslab's ms_allocatable range tree, a ms_trim tree is kept per metaslab. When 'autotrim=on', ranges added back to the ms_allocatable tree are also added to the ms_free tree. The ms_free tree is then periodically consumed by an autotrim thread which systematically walks a top level vdev's metaslabs. Since the automatic TRIM will skip ranges it considers too small there is value in occasionally running a full `zpool trim`. This may occur when the freed blocks are small and not enough time was allowed to aggregate them. An automatic TRIM and a manual `zpool trim` may be run concurrently, in which case the automatic TRIM will yield to the manual TRIM. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Tim Chase <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Contributions-by: Saso Kiselkov <[email protected]> Contributions-by: Tim Chase <[email protected]> Contributions-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #8419 Closes #598
UNMAP/TRIM support is a frequently-requested feature to help prevent performance from degrading on SSDs and on various other SAN-like storage back-ends. By issuing UNMAP/TRIM commands for sectors which are no longer allocated the underlying device can often more efficiently manage itself. This TRIM implementation is modeled on the `zpool initialize` feature which writes a pattern to all unallocated space in the pool. The new `zpool trim` command uses the same vdev_xlate() code to calculate what sectors are unallocated, the same per- vdev TRIM thread model and locking, and the same basic CLI for a consistent user experience. The core difference is that instead of writing a pattern it will issue UNMAP/TRIM commands for those extents. The zio pipeline was updated to accommodate this by adding a new ZIO_TYPE_TRIM type and associated spa taskq. This new type makes is straight forward to add the platform specific TRIM/UNMAP calls to vdev_disk.c and vdev_file.c. These new ZIO_TYPE_TRIM zios are handled largely the same way as ZIO_TYPE_READs or ZIO_TYPE_WRITEs. This makes it possible to largely avoid changing the pipieline, one exception is that TRIM zio's may exceed the 16M block size limit since they contain no data. In addition to the manual `zpool trim` command, a background automatic TRIM was added and is controlled by the 'autotrim' property. It relies on the exact same infrastructure as the manual TRIM. However, instead of relying on the extents in a metaslab's ms_allocatable range tree, a ms_trim tree is kept per metaslab. When 'autotrim=on', ranges added back to the ms_allocatable tree are also added to the ms_free tree. The ms_free tree is then periodically consumed by an autotrim thread which systematically walks a top level vdev's metaslabs. Since the automatic TRIM will skip ranges it considers too small there is value in occasionally running a full `zpool trim`. This may occur when the freed blocks are small and not enough time was allowed to aggregate them. An automatic TRIM and a manual `zpool trim` may be run concurrently, in which case the automatic TRIM will yield to the manual TRIM. Reviewed-by: Jorgen Lundman <[email protected]> Reviewed-by: Tim Chase <[email protected]> Reviewed-by: Matt Ahrens <[email protected]> Reviewed-by: George Wilson <[email protected]> Reviewed-by: Serapheim Dimitropoulos <[email protected]> Contributions-by: Saso Kiselkov <[email protected]> Contributions-by: Tim Chase <[email protected]> Contributions-by: Chunwei Chen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#8419 Closes openzfs#598
Signed-off-by: Paul Dagnelie <[email protected]>
ZoL should issue SATA TRIM commands to the underlying block devices when blocks are freed.
For CACHE devices (unless persistend L2ARC may be implemented at some point in the future): on cache device activation (and deactivation, like zpool remove ) a TRIM command should free the whole space (minus needed persistent superblocks) available on such devices .
Only then SSDs will be hinted of unused blocks so their wear-leveling algorithms can function as intended.
The text was updated successfully, but these errors were encountered: