TRIM/UNMAP/DISCARD support for vdevs (2) #1016

dechamps · 2012-10-05T08:03:18Z

This adds TRIM (a.k.a UNMAP, DISCARD, hole punching) support for vdevs. The original patch is from Pawel Jakub Dawidek who wrote it for FreeBSD. I just ported it to ZFS On Linux with added features and fixed bugs. This is a clean version of #924. Note that you need openzfs/spl#168 to build this.

This patch should be 99% stable. I'm still not sure about the last 1% because I'm sometimes having ztest failures which are difficult to reproduce and might be caused by ZoL ztest bugs. Unfortunately, I'm running out of time and I won't be able to investigate any further, so I'm leaving this to anyone interested.

In any case, if there are indeed bugs, actual failures should be extremely rare and are very unlikely to cause any data corruption. Nevertheless, I would advise anyone playing with this to err on the safe side with a good backup policy, at least until the code is thoroughly reviewed.

@behlendorf: you told me George Wilson is interested in this work. You should direct him to this pull request for review. Hopefully he'll be able to tell if there are stability issues with this code or not.

NOTE : it seems Github displays commits in chronological order in this pull request, as opposed the logical (parent-child) order, which makes it harder to understand and review. If you want to review this, I suggest you use the compare view, which list commits in their actual order.

Features

Automatic, fully transparent TRIM on data, slog and cache vdevs. TRIM requests are not immediately issued in order not to break pool disaster discovery, and are batched for optimal performance.
Whole vdev TRIM on zpool create/add/attach (similar to what mke2fs does).
Whole vdev TRIM on cache vdevs on export/remove/destroy.
Supports disk vdevs using the standard Linux DISCARD interface (which translates to ATA TRIM, SCSI UNMAP, or any other method supported by the vdev block device driver)
Supports file vdevs using the Linux file hole punching interfaces.

TODO list

Investigate and fix "request botched" kernel error messages
Fix performance issues on devices with slow discard implementations (i.e. fix command priority and queuing)

Usage instructions

Note that the new feature is enabled by default. To disable it, use zfs_notrim=1 as module parameter. For TRIM to work, you need to have vdevs that support it, e.g. SSDs. If you don't have any, you can still test using a Linux RAM block device (rd module), as it supports discards. Note that if you choose to test with rd, make sure to use ashift=12, else you will trigger a corruption bug in rd.

There is a module parameter (trim_txg_limit) that specifies after how many TXGs the freed data is actually discarded on the device (default 32). Indeed, the algorithm doesn't discard data immediately so that pool disaster recovery (i.e. zpool -F) don't break. I suggest you test with the default and with a low value (e.g. 1, 0) to make the algorithm more aggressive, which might trigger different bugs.

There is a module parameter (trim_txg_batch) that specifies how many TXGs are squashed together in order to batch (merge) TRIM requests (default 32). The optimal value heavily depends on your workload and the performance characteristics of your physical vdevs. I suggest you test with the default and with a low value (e.g. 1, 0) to make the algorithm more aggressive, which might trigger different bugs.

Module parameters trim_l2arc_limit and trim_l2arc_batch work the same way but are specific to cache devices. They are expressed in seconds instead of TXGs.

There is a module parameter (zfs_trim_zero) that, when enabled, makes sure TRIMmed data is zeroed. Note that for disk vdevs, this effectively means that TRIM is disabled and replaced by "write zeroes". This mode allows ztest to test TRIM even on filesystems that don't support file hole punching. Outside ztest, it can also be used to diagnose corruption issues more reliably. Note that, for obvious reasons, performance greatly suffers in this mode.

The patch introduces a new kstat page in /proc/spl/kstat/zfs/zio_trim. It allows you to quickly check if TRIM is actually working or not. Keep in mind that if you're using a high trim_txg_limit (e.g. 64), then it may take a while before TRIM requests are actually issued.

TRIM support on file vdevs (a.k.a file hole punching)

There are a few prerequisites for file (not disk) vdev support using Linux file hole punching capabilities:

The underlying filesystem (where the file vdevs are stored) must support file hole punching. Currently, only OCFS2, XFS and ext4 support it. tmpfs supports it via the truncate_range interface, so it should work (albeit only from kernel space). Ironically, ZFS itself doesn't (yet).
In the case of ZFS in userspace (e.g. ztest), be sure that your installed Linux headers (i.e. /usr/include/linux) are up to date (>= 2.6.38). Also, as mentioned above, if you're using ztest, be sure to run it on a filesystem that supports file hole punching (tmpfs doesn't). Note that you can't change parameters using ztest, so you'll have to hack trim_map.c to manually set zfs_notrim=0 by default and rebuild.

A good way to be sure that TRIM support on file vdev works is to do something like this with trim_txg_limit=0 and trim_txg_batch=0, assuming /store is stored on ext4 or another filesystem with file hole punching support:

# dd if=/dev/zero of=/store/pool bs=1 count=1 seek=1024M
# zpool create punchhole /store/pool
# du -h /store/pool
1.2M    /store/pool
# dd if=/dev/urandom of=/punchhole/foo bs=1M count=100
# du -h /store/pool
102M    /mnt/file
# rm /punchhole/foo
# du -h /store/pool
1.7M    /store/pool

Again, this section only applies to file vdevs, not disk (block device) vdevs. Most users have no use for TRIM support on file vdevs.

Currently, the PSIZE of the block pointer is used as the I/O size for all free ZIOs. This is incorrect for gang header blocks, for which PSIZE is the logical size of the gang block data. This patch fixes the issue by using SPA_GANGBLOCKSIZE as the I/O size for "free gang header block" ZIOs.

Currently, zavl users cannot call avl_is_empty since it is not exported. This patch fixes that.

The code builds a map of regions that were freed. On every write the code consults the map and eventually removes ranges that were freed before, but are now overwritten. Freed blocks are not TRIMed immediately. There is a tunable that defines how many txg we should wait with TRIMming freed blocks (64 by default). There is a low priority thread that TRIMs ranges when the time comes. During TRIM we keep in-flight ranges on a list to detect colliding writes - we have to delay writes that collide with in-flight TRIMs in case something will be reordered and write will reached the disk before the TRIM. We don't have to do the same for in-flight writes, as colliding writes just remove ranges to TRIM. NOTE: this patch only adds the core infrastucture for TRIM. It doesn't implement actual discard requests on vdevs. In other words, the patch in itself doesn't actually do anything (yet). NOTE: you need DKIOCTRIM support in SPL for this patch to build. Signed-off-by: Etienne Dechamps <[email protected]>

Currently, compilation fails because trim_map_free_locked() exceeds the frame size. This is because we're allocating a large structure (zio_t) on the stack. This patch fixes the issue by only allocating the fields we need from the zio structure.

This patch adds a new kstat page: zio_trim, which indicates the total number of bytes TRIMmed, and the number of successful, unsupported and failed TRIM requests. This makes it easy to verify if TRIM is actually working or not.

Since 2fd4ce81a6748b46728127212a1ba384c008696f, gang block free zios have correct size, so there's no need to skip them. Besides, the test fails with a NULL dereference bug on RAID-Z because io_bp is NULL in this case.

Currently, the TRIM module doesn't check for spa_freeze_txg when issuing TRIM requests. In a normal run this isn't an issue since vdev_config_sync() doesn't wake the TRIM thread if the SPA is frozen. However, when closing a frozen pool, trim_thread_destroy() gets called, which wakes the TRIM thread, which results in a TRIM commit. As a result, all TXGs until spa_syncing_txg - txg_limit get TRIMmed, which includes spa_freeze_txg. As a result, when reopening, ZFS tries to read data from the frozen TXG, and fails. This patch fixes the issue by taking spa_freeze_txg into account when commiting TRIMs.

This adds DISCARD support to vdev_disk_physio(), and uses it to execute DKIOCTRIM requests. The implementation is inspired by the Linux blkdev_issue_discard() function.

This adds TRIM support for file vdevs, using the newly-added VOP_SPACE compatibility interface which translates to Linux fallocate(FALLOC_FL_PUNCH_HOLE). When a TRIM request is issued to a file vdev, ZFS will "punch a hole" (deallocate space) inside the underlying file. Note that few Linux filesystems support hole punching (currently OCFS2, XFS, and ext4). Note that for this to work, you need to build against a kernel that supports file hole punching (i.e. Linux >= 2.6.38). In addition, for this to work in userspace (e.g. ztest), the installed userspace kernel headers (i.e. /usr/include/linux) must be up-to-date. NOTE: you need VOP_SPACE() support in SPL for this patch to build.

This patch adds a new module tunable: zfs_trim_zero. When this parameter is set to 1, ZFS makes sure that DKIOCTRIM zeroes the range being TRIMmed. This makes data corruption issues easier to reproduce, at the cost of greatly decreased performance. On disk vdevs, zfs_trim_zero replaces DISCARD with zero page writes, which means it will work even with vdevs that do not support DISCARD. Note that DISCARD won't be used even on vdevs that support it. On file vdevs, if FALLOC_FL_PUNCH_HOLE fails and zfs_trim_zero=1, then a zero buffer will be written instead. It is not necessary to write zeroes if hole punching works because FALLOC_FL_PUNCH_HOLE specifies that the specified range must appear as zeroes after hole punching.

For ztest to reliably check for corruption issues in the TRIM code, we need to make sure TRIM zeroes data, even if the underlying filesystem doesn't support file hole punching.

ztest runs with TRIM enabled on a backing filesystem that doesn't support hole punching revealed the following race condition: 1. A write is issued to a vdev and added to the TRIM map inflight writes tree. 2. The very first TRIM request for this vdev is issued. 3. The TRIM request fails with EOPNOTSUPP, as expected. vdev_notrim is set. 4. The write completes. trim_map_write_done() is called. 5. trim_map_write_done() sees that vdev_notrim is set, and returns immediately. 6. As a result, the write stays in the TRIM map inflight writes tree forever. 7. When the vdev is closed, trim_map_destroy() tries to destroy the inflight writes tree, which triggers an assertion failure because it isn't empty. This patch fixes the issue by removing the check on vdev_notrim in trim_map_write_done().

ztest runs with TRIM enabled revealed a race condition which occurs when a vdev close is pending while there are TRIM requests in flight. When the TRIM ZIOs complete, the SPA lock is released. The SPA lock is then immediately taken by the vdev close path, which calls trim_map_destroy() before trim_map_vdev_commit_done() has a chance to run. trim_map_destroy() then tries to destroy the inflight free tree, which fails with an assertion failure because it is non-empty. This patch fixes the issue by making trim_map_destroy() call trim_map_vdev_commit_done() before destroying the TRIM map.

With ashift=12, zio_vdev_io_start() expands all ZIOs with size < 4096 using zio_push_transform(), allocating a new ZIO buffer in the process. The allocation is of course unnecessary for DKIOCTRIM ZIOs, which don't hold any data. This patch makes sure a new buffer is only allocated on read/write ZIOs. This also fixes an assertion failure in __vdev_disk_physio because it was called with DISCARD and a non-null data buffer, which makes no sense.

When a FREE ZIO is issued on raidz, vdev_raidz_map_alloc() skips allocating data for the raidz columns because it is not needed. vdev_raidz_map_free(), however, has not been updated accordingly, and still tries to free the column data. As a result, zio_buf_free() is used with a NULL buffer, which corrupts the zio_buf_cache. This patch fixes the issue by making vdev_raidz_map_free() do a NULL check before freeing the column data. This fixes the following panic when running zconfig.sh -c: SPLError: 17651:0:(spl-kmem.c:1936:spl_kmem_cache_alloc()) ASSERTION(obj) failed SPLError: 17651:0:(spl-kmem.c:1936:spl_kmem_cache_alloc()) SPL PANIC

The trim map inflight writes tree assumes non-conflicting writes, i.e. that there will never be two simultaneous write I/Os to the same range on the same vdev. This seemed like a sane assumption; however, in actual testing, it appears that repair I/Os can very well conflict with "normal" writes. I'm not quite sure if these conflicting writes are supposed to happen or not, but in the mean time, let's ignore repair writes for now. This should be safe considering that, by definition, we never repair blocks that are freed.

The current code releases SCL_STATE while it is waiting for TRIM ZIOs to complete. That's dangerous because the vdev configuration could change while the I/O is in progress. This patch makes sure SCL_STATE is held during the whole duration of the I/O. This is consistent with the way e.g. zil_flush_vdevs() does things.

There's no reason to use the ZIO_FLAG_CONFIG_WRITER flag for TRIM ZIOS. Remove it.

With this patch, ZFS will attempt to TRIM the whole vdev in the following cases: - zpool create, zpool add, zpool attach any vdev - zpool import spare and L2ARC devices We do this to make sure we start from a "clean state". This way, the physical device knows that all the data it held before it was added to the pool is now obsolete. This is similar to what mke2fs does.

This adds TRIM support to cache vdevs. When ARC buffers are removed from the L2ARC in arc_hdr_destroy(), arc_release() or l2arc_evict(), the size previously occupied by the buffer gets scheduled for TRIMming. As always, actual TRIMs are only issued to the L2ARC after txg_trim_limit.

With this patch: - L2ARC vdevs are trimmed on export, except the labels which are preserved to keep import working. - L2ARC vdevs are trimmed entirely on remove and pool destroy.

This patch adds some improvements to the way the trim module considers TXGs: - Free ZIOs are registered with the TXG from the ZIO itself, not the current SPA syncing TXG (which may be out of date); - L2ARC are registered with a zero TXG number, as L2ARC has no concept of TXGs; - The TXG limit for issuing TRIMs is now computed from the last synced TXG, not the currently syncing TXG. Indeed, under extremely unlikely race conditions, there is a risk we could trim blocks which have been freed in a TXG that has not finished syncing, resulting in potential data corruption in case of a crash.

Currently, the TRIM thread waits indefinitely until it is waked up by spa_sync(). This triggers a blocked task warning in Linux if there is no write activity on the pool for a long time. This patch fixes the issue by making the TRIM thread wake every second to make the kernel happy.

While the current code delays TRIMs by a number of TXGs specified by the trim_txg_limit parameter, it always TRIMs frees from one TXG at a time in most circumstances, i.e. it doesn't merge frees from adjacent TXGs. This patch allows to batch TRIMs from several consecutive TXGs in order to merge more TRIM requests, thus reducing their number and making them larger. This should significantly improve TRIM performance, depending on your workload and how your physical vdev handles TRIM requests. The batch size, expressed in TXGs, is specified by the new trim_txg_batch parameter. It defaults to 32. Note the default for trim_txg_limit has been reduced from 64 to 32 to keep the worst-case trim delay constant.

Currently, the trim module uses the same algorithm for data and cache devices when deciding to issue TRIM requests, based on how far in the past the TXG is. Unfortunately, this is not ideal for cache devices, because the L2ARC doesn't use the concept of TXGs at all. In fact, when using a pool for reading only, the L2ARC is written but the TXG counter doesn't increase, and so no new TRIM requests are issued to the cache device. This patch fixes the issue by using time instead of the TXG number as the criteria for trimming on cache devices. The basic delay/batch principle stays the same, but parameters are expressed in seconds instead of TXGs. The new parameters are named trim_l2arc_limit and trim_l2arc_batch, and both default to 30 second.

behlendorf · 2012-10-05T16:09:13Z

@dechamps Thanks for cleaning up pull request and detailed summary. I'm working through reviewing the changes now, but I've noticed the following ASSERT gets regularly tripped by ztest with the TRIM patch stack. This doesn't occur prior to these changes and I'm able to hit it quite easily. Have you seen this?

ztest: ../../cmd/ztest/ztest.c:5733: Assertion `0 == spa_open(ztest_opts.zo_pool, &spa, ((char *)__func__)) (0x0 == 0x5)' failed.

dechamps · 2012-10-05T16:34:57Z

That's strange, I never got that with any recent version of the patch. I do get this error, however, when I forget to disable the reguid test in ztest, but this branch has this test disabled, so I assume that's not it.

What is the filesystem behind your ztest working directory (by default /tmp)? Do you pass any special parameters to ztest (I'm using defaults)?

In addition, can you try with 6a3ceba? That should be the first stable commit in the patch set.

behlendorf · 2012-10-05T18:11:02Z

What is the filesystem behind your ztest working directory (by default /tmp)?

That appears to be it. In my VM's /tmp is whatever the default is for the distribution. More often than not that means ext4 over LVM. It appears that LVM is the cause since when I take it out of the stack I'm unable to reproduce the issue.

dechamps · 2012-10-05T21:21:08Z

It appears that LVM is the cause since when I take it out of the stack I'm unable to reproduce the issue.

Okay, now this is getting very strange. Personally, I'm running ztest on top of tmpfs. I have two hypothesis regarding your issue:

The fact that you're using LVM changes the timing of I/Os in such a way that it unveils a race condition in the new code. This seems extremely unlikely, and I hope that's not it since it would be extremely difficult to track down.
On ext4, ztest will use fallocate(FALLOC_FL_PUNCH_HOLE) (tmpfs doesn't support it, in which case ztest will write zeroes instead). Maybe the implementation of this ext4 feature has not been well-tested and you're hitting a bug which corrupts ztest's data and happens to be triggered when ext4 sits on top of LVM. Seems far-fetched, but hey, as explained in the first post, I hit a similar bug with the Linux RAM block device driver, so it wouldn't be all that surprising.

Is the ext4 filesystem you're running ztest on mounted with the discard option? If so, can you please test without it? (maybe it was enabled with LVM, but you disabled it by accident when testing without LVM, which would explain a lot) In addition, it would be interesting to disable the VOP_SPACE() call in vdev_file_trim() so that it will explictly writes zeroes instead of using file hole punching.

dechamps · 2013-08-19T21:18:35Z

@ryao: if you want to work on this, the best approach would probably to use @skiselkov's code and mix it with some of my additional patches (like TRIM whole devices on add, etc.), since most of these are not related to the actual algorithm used to TRIM data blocks.

skiselkov · 2013-08-20T00:39:21Z

@ryao The latest patch version can be found here: http://cr.illumos.org/~webrev/skiselkov/zfs_unmap/ (did a quick rebase on latest illumos master). This implementation of trim works by tracking the metaslab allocator and issuing TRIMs (in a time-delayed fashion) on regions where metaslab has released portions. The tracking itself is done using in-core spacemaps, so we effectively coalesce adjacent extents. I try to keep stuff reasonably documented, but in case there are unclear portions, give me a kick.
As for COMSTAR, it only supports TRIM on its input and implements it as a dbuf release operation on zvols. It doesn't propagate it down to arbitrary backing stores.

niekbergboer · 2013-08-29T13:29:03Z

I just set up a virtual machine that has access to two real SSD partitions in order to help testing TRIM on ZoL.

Do you have a more-or-less current rebase to something close to ZoL HEAD for me? Please also point me to any stress-testing tools that you know of.

ryao · 2013-08-29T17:45:31Z

@skiselkov That sounds sufficient. Just put a pool on an iSCSI LUN that is mapped to a zvol and you should be able to test the TRIM functionality. The only difference is that ZFS won't be simulating the deficiencies of real controllers.

@niekbergboer There is no rebase yet, but I will see about adapting @skiselkov's work soon.

dechamps · 2013-08-29T17:53:08Z

@skiselkov That sounds sufficient. Just put a pool on an iSCSI LUN that is mapped to a zvol and you should be able to test the TRIM functionality. The only difference is that ZFS won't be simulating the deficiencies of real controllers.

On Linux a simpler solution is to use a ramdisk (rd) as it supports DISCARD operations; however, keep https://bugzilla.kernel.org/show_bug.cgi?id=47711 in mind.

tomposmiko · 2013-10-26T08:47:01Z

@skiselkov May I ask, what is the status of this port?
I wanted to test the from http://cr.illumos.org/~webrev/skiselkov/zfs_unmap/, but it doesn't apply to v0.6.2.

skiselkov · 2013-10-26T08:50:44Z

@sopmot It's a work in progress, not yet done.

giezi · 2014-05-27T12:13:14Z

Hi all

We would love to see TRIM support in ZFS on Linux - how can we support this pull request (e.g. providing test-hardware with many SSDs and full SSH access)?

Many thanks for all your hard work and efforts!!

Best regards
Reto

tomposmiko · 2014-05-27T12:51:47Z

+1
I saw a short irc log with @ryao, that he willing to do the rest of the work and finish the task, but at that time he was busy with a more serious issue. Maybe he could catch the ball, if it's appropriate now.

@ryao : Do I remember well?:)

stevenh · 2014-05-27T13:32:34Z

I added support for the new queuing rewrite for TRIM in FreeBSD has so be worth anyone working on the metaslab version looking to include that.

It does currently have some temporary fixes for IO reordering panics which I expect to change when I hear back from George Wilson on his fixes for that, but the core is solid.

behlendorf · 2014-05-28T01:04:41Z

This pull request is going to need to be updated to use the latest code from George. We certainly want to avoid doing our own thing if possible.

giezi · 2014-09-04T12:57:55Z

Hi all

We are seriously interested in supporting this pull request and are also ready to donate for the development. As SSD-Storage becomes more and more a kind of "standard", we think this may be a big step forward for the ZFS on Linux project.

Somebody interested out there? We look forward to the feedback.

Best regards,
Reto (http://innofield.com)

tycho · 2014-09-04T13:01:29Z

Agree, would like to see this one prioritized. What work is remaining at this point? A rebase of the patch series?

tomposmiko · 2014-09-04T13:09:27Z

I asked @ryao 's opinion I couple of weeks ago about this on irc.
He told me that there are tasks with higher priority and he cannot work on the before the end of the year.

My understanding is that otherwise he would be interested:)

We also discussed in 2 words, that an fstrim-like solution would be probably the easiest and fastest way and it may be the best solution (if I recall correctly...).

eborisch · 2014-12-09T18:31:49Z

For some (myself included) a trim of log and l2 devices at boot / import would go a long way, and shouldn't (right? they start "empty" -- after any log replay if needed -- each time) be too involved...

ghost · 2015-02-05T13:10:05Z

👍

dechamps · 2015-04-22T13:36:30Z

Great news! It means we can drop this pull request (which has bugs that I never got around to fix, and is probably technically inferior anyway) and just implement the Linux DISCARD calls to back the Illumos stuff. That should be much easier.

pavel-odintsov · 2015-04-22T13:52:16Z

Awesome news! I could test it asap!

kpande, could you explain how trim could suffers my SSD performance? From my point of view it's aimed to increase performance of my drive.

zviratko · 2015-04-22T15:18:05Z

The "discard" mount option slows down writes considerably, and it could have detrimental effect on the longevity of your SSD.
What you should do instead is run "fstrim" from time to time during your maintenance window, but I've been told it has it's own problems with slowing drives to the crawl (to the point where things like Ceph OSDs kick them out of the pool because of timeouts)

skiselkov · 2015-04-22T15:40:41Z

Hey guys, I'm the author of this. I'd caution against early porting. This is still a WIP, we need to figure out the whitelist of devices where we can safely enable this by default. Also, raidz is currently untested, so it might eat data, for all I know. Lastly, I'd like to get on-demand trim (aka "fstrim") implemented as well before I'd consider this stuff "ready".

skiselkov · 2015-04-22T15:48:07Z

I don't think it's part of public builds yet. It literally got rolled into the gate this Monday to get it into our internal QA builds so QA folk can share in the joys of losing data and having unexplained scrub errors :P I guess people could try and build their own, but that's not for the faint of heart, as this stuff is literally hot off the press. Give me a few more weeks to get it to work well :)

odoucet · 2015-06-05T08:08:11Z

Any update on this ? This feature is really useful ... at least with fstrim that can be launched anytime and does not slow down writes;

Marlinc · 2015-09-13T23:23:59Z

Any status on this?

dweeezil · 2015-09-14T00:46:01Z

@Marlinc #3656 has been working well for me in testing so far. I've now tested with more complex vdev arrangements: raidz,2,3 etc. It was rebased it against a post-0.6.5 master over this weekend. I added vdev-level tracing support to handle its new DTRACE2 call and possibly other vdev-related tracing in the future. I still want to augment the documentation a bit and to investigate the txg batching feature somewhat (at the very least, we need a nonzero check for one of the new module parameters).

behlendorf · 2016-01-12T23:54:10Z

Closing we'll pick up this support once it's finalized in illumos. In the meanwhile @dweeezil has ported this change to Linux and it is working well.

…ocol is not provided. (openzfs#1016)

dechamps and others added 25 commits October 5, 2012 09:42

Export avl_is_empty from zavl.

56fd167

Currently, zavl users cannot call avl_is_empty since it is not exported. This patch fixes that.

Add ZIO TRIM statistics.

8550163

This patch adds a new kstat page: zio_trim, which indicates the total number of bytes TRIMmed, and the number of successful, unsupported and failed TRIM requests. This makes it easy to verify if TRIM is actually working or not.

Don't test for gang blocks when populating the trim map.

bd7b454

Since 2fd4ce81a6748b46728127212a1ba384c008696f, gang block free zios have correct size, so there's no need to skip them. Besides, the test fails with a NULL dereference bug on RAID-Z because io_bp is NULL in this case.

Add TRIM support for disk vdevs using Linux DISCARD interface.

4eb0414

This adds DISCARD support to vdev_disk_physio(), and uses it to execute DKIOCTRIM requests. The implementation is inspired by the Linux blkdev_issue_discard() function.

Use zfs_trim_zero=1 in ztest.

64016ec

For ztest to reliably check for corruption issues in the TRIM code, we need to make sure TRIM zeroes data, even if the underlying filesystem doesn't support file hole punching.

Don't use ZIO_FLAG_CONFIG_WRITER for TRIM ZIOs.

9c15fca

There's no reason to use the ZIO_FLAG_CONFIG_WRITER flag for TRIM ZIOS. Remove it.

TRIM cache devices on export and remove.

c85f097

With this patch: - L2ARC vdevs are trimmed on export, except the labels which are preserved to keep import working. - L2ARC vdevs are trimmed entirely on remove and pool destroy.

dechamps mentioned this pull request Oct 5, 2012

TRIM/UNMAP/DISCARD support for vdevs #924

Closed

behlendorf added the Component: ZVOL ZFS Volumes label Aug 31, 2015

behlendorf closed this Jan 12, 2016

demizer mentioned this pull request May 7, 2016

Additional packages for SSD TRIM branch archzfs/archzfs#55

Closed

jittygitty mentioned this pull request Apr 30, 2022

FAST-Tracking REFLINK and Offline Deduplication, first for LINUX only #13349

Closed

pcd1193182 added a commit to pcd1193182/zfs that referenced this pull request Sep 26, 2023

DLPX-81956 zdb crashes with object store pool if bucket name and prot…

2f033cd

…ocol is not provided. (openzfs#1016)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TRIM/UNMAP/DISCARD support for vdevs (2) #1016

TRIM/UNMAP/DISCARD support for vdevs (2) #1016

dechamps commented Oct 5, 2012

behlendorf commented Oct 5, 2012

dechamps commented Oct 5, 2012

behlendorf commented Oct 5, 2012

dechamps commented Oct 5, 2012

dechamps commented Aug 19, 2013

skiselkov commented Aug 20, 2013

niekbergboer commented Aug 29, 2013

ryao commented Aug 29, 2013

dechamps commented Aug 29, 2013

tomposmiko commented Oct 26, 2013

skiselkov commented Oct 26, 2013

giezi commented May 27, 2014

tomposmiko commented May 27, 2014

stevenh commented May 27, 2014

behlendorf commented May 28, 2014

giezi commented Sep 4, 2014

tycho commented Sep 4, 2014

tomposmiko commented Sep 4, 2014

eborisch commented Dec 9, 2014

ghost commented Feb 5, 2015

dechamps commented Apr 22, 2015

pavel-odintsov commented Apr 22, 2015

zviratko commented Apr 22, 2015

skiselkov commented Apr 22, 2015

skiselkov commented Apr 22, 2015

odoucet commented Jun 5, 2015

Marlinc commented Sep 13, 2015

dweeezil commented Sep 14, 2015

behlendorf commented Jan 12, 2016

TRIM/UNMAP/DISCARD support for vdevs (2) #1016

TRIM/UNMAP/DISCARD support for vdevs (2) #1016

Conversation

dechamps commented Oct 5, 2012

Features

TODO list

Usage instructions

TRIM support on file vdevs (a.k.a file hole punching)

behlendorf commented Oct 5, 2012

dechamps commented Oct 5, 2012

behlendorf commented Oct 5, 2012

dechamps commented Oct 5, 2012

dechamps commented Aug 19, 2013

skiselkov commented Aug 20, 2013

niekbergboer commented Aug 29, 2013

ryao commented Aug 29, 2013

dechamps commented Aug 29, 2013

tomposmiko commented Oct 26, 2013

skiselkov commented Oct 26, 2013

giezi commented May 27, 2014

tomposmiko commented May 27, 2014

stevenh commented May 27, 2014

behlendorf commented May 28, 2014

giezi commented Sep 4, 2014

tycho commented Sep 4, 2014

tomposmiko commented Sep 4, 2014

eborisch commented Dec 9, 2014

ghost commented Feb 5, 2015

dechamps commented Apr 22, 2015

pavel-odintsov commented Apr 22, 2015

zviratko commented Apr 22, 2015

skiselkov commented Apr 22, 2015

skiselkov commented Apr 22, 2015

odoucet commented Jun 5, 2015

Marlinc commented Sep 13, 2015

dweeezil commented Sep 14, 2015

behlendorf commented Jan 12, 2016