WARNING: possible circular locking dependency detected (io_rsrc_put_work+0x13d/0x1a0) #404

ammarfaizi2 · 2021-08-09T09:56:15Z

Hi Jens,
I got a test failed with a potential deadlock at <rsrc_tags>.

Kernel: 5.14.0-rc3-bluetea-test-00249-gc7d102232649

[ 9257.317473] Running test 232c93d07b74-test:
[ 9261.828744] Running test 35fa71a030ca-test:
[ 9321.858788] Running test 500f9fbadef8-test:
[ 9323.308544] Running test 7ad0e4b2f83c-test:
[ 9324.419221] Running test 8a9973408177-test:
[ 9325.635902] Running test 917257daa0fe-test:
[ 9325.755814] Running test a0908ae19763-test:
[ 9325.871406] Running test a4c0b3decb33-test:
[ 9340.684624] Running test accept:
[ 9342.070441] Running test accept-link:
[ 9342.389549] Running test accept-reuse:
[ 9342.511045] Running test accept-test:
[ 9342.627728] Running test across-fork:
[ 9342.745884] Running test splice:
[ 9342.872690] Running test b19062a56726-test:
[ 9342.981963] Running test b5837bd5311d-test:
[ 9343.100657] Running test ce593a6c480a-test:
[ 9344.229048] Running test close-opath:
[ 9344.339398] Running test connect:
[ 9344.363080] TCP: request_sock_TCP: Possible SYN flooding on port 29878. Sending cookies.  Check SNMP counters.
[ 9344.453743] Running test cq-full:
[ 9344.563967] Running test cq-overflow:
[ 9355.410900] Running test cq-peek-batch:
[ 9355.522211] Running test cq-ready:
[ 9355.635496] Running test cq-size:
[ 9355.747607] Running test d4ae271dfaae-test:
[ 9356.041987] Running test d77a67ed5f27-test:
[ 9356.303955] Running test defer:
[ 9359.413848] Running test double-poll-crash:
[ 9359.528898] Running test eeed8b54e0df-test:
[ 9359.683170] Running test eventfd:
[ 9359.814646] Running test eventfd-disable:
[ 9359.945202] Running test eventfd-ring:
[ 9360.092271] Running test fadvise:
[ 9360.413969] Running test fallocate:
[ 9360.561731] Running test fc2a85cb02ef-test:
[ 9360.677307] Running test file-register:
[ 9378.660480] Running test file-update:
[ 9381.894220] Running test files-exit-hang-poll:
[ 9382.996801] Running test files-exit-hang-timeout:
[ 9384.095303] Running test fixed-link:
[ 9384.194950] Running test fsync:
[ 9384.422513] Running test hardlink:
[ 9384.528068] Running test io-cancel:
[ 9387.144482] Running test io_uring_enter:
[ 9387.506670] Running test io_uring_register:
[ 9389.784601] Running test io_uring_setup:
[ 9389.892289] Running test iopoll:
[ 9402.967112] Running test lfs-openat:
[ 9403.070135] Running test lfs-openat-write:
[ 9403.173498] Running test link:
[ 9403.313011] Running test link-timeout:
[ 9404.916786] Running test link_drain:
[ 9405.841861] Running test madvise:
[ 9406.152754] Running test mkdir:
[ 9406.256519] Running test multicqes_drain:
[ 9451.376548] Running test nop:
[ 9451.477609] Running test nop-all-sizes:
[ 9451.622920] Running test open-close:
[ 9451.868377] Running test openat2:
[ 9452.092030] Running test personality:
[ 9452.197029] Running test pipe-eof:
[ 9452.301000] Running test pipe-reuse:
[ 9452.409544] Running test poll:
[ 9452.512898] Running test poll-cancel:
[ 9452.613946] Running test poll-cancel-ton:
[ 9453.008100] Running test poll-link:
[ 9453.316323] Running test poll-many:
[ 9465.760035] Running test poll-mshot-update:
[ 9474.184772] Running test poll-ring:
[ 9474.294344] Running test poll-v-poll:
[ 9474.815263] Running test probe:
[ 9474.925634] Running test read-write:
[ 9519.635522] Running test register-restrictions:
[ 9519.995146] Running test rename:
[ 9520.108175] Running test ring-leak:
[ 9520.246420] Running test ring-leak2:
[ 9521.360344] Running test rw_merge_test:
[ 9521.470459] Running test self:
[ 9521.582026] Running test send_recv:
[ 9521.702652] Running test send_recvmsg:
[ 9521.820930] Running test shared-wq:
[ 9521.934517] Running test short-read:
[ 9522.136810] Running test shutdown:
[ 9522.253045] Running test sigfd-deadlock:
[ 9522.372489] Running test socket-rw:
[ 9522.485295] Running test socket-rw-eagain:
[ 9522.597577] Running test sq-full:
[ 9522.707849] Running test sq-poll-dup:
[ 9526.798436] Running test sq-poll-kthread:
[ 9529.006681] Running test sq-poll-share:
[ 9536.551806] Running test sqpoll-disable-exit:
[ 9537.131130] Running test sqpoll-exit-hang:
[ 9538.244750] Running test sqpoll-cancel-hang:
[ 9539.352283] Running test sqpoll-sleep:
[ 9539.466045] Running test sq-space_left:
[ 9539.576885] Running test stdout:
[ 9541.760714] Running test submit-reuse:
[ 9545.929462] Running test symlink:
[ 9546.033826] Running test teardowns:
[ 9546.236248] Running test thread-exit:
[ 9546.423347] Running test timeout:
[ 9552.550575] Running test timeout-new:
[ 9555.062722] Running test timeout-overflow:
[ 9555.168925] Running test unlink:
[ 9555.268847] Running test wakeup-hang:
[ 9557.369264] Running test sendmsg_fs_cve:
[ 9557.476769] Running test rsrc_tags:

[ 9568.748103] ======================================================
[ 9568.748111] WARNING: possible circular locking dependency detected
[ 9568.748120] 5.14.0-rc3-bluetea-test-00249-gc7d102232649 #5 Tainted: G           OE    
[ 9568.748130] ------------------------------------------------------
[ 9568.748136] kworker/2:4/2684 is trying to acquire lock:
[ 9568.748145] ffff88814bb1c0a8 (&ctx->uring_lock){+.+.}-{3:3}, at: io_rsrc_put_work+0x13d/0x1a0
[ 9568.748183] 
               but task is already holding lock:
[ 9568.748190] ffffc90001c6be70 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530
[ 9568.748219] 
               which lock already depends on the new lock.

[ 9568.748226] 
               the existing dependency chain (in reverse order) is:
[ 9568.748232] 
               -> #1 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}:
[ 9568.748249]        __flush_work+0x31b/0x490
[ 9568.748263]        io_rsrc_ref_quiesce.part.0.constprop.0+0x35/0xb0
[ 9568.748276]        __do_sys_io_uring_register+0x45b/0x1060
[ 9568.748289]        do_syscall_64+0x35/0xb0
[ 9568.748301]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 9568.748316] 
               -> #0 (&ctx->uring_lock){+.+.}-{3:3}:
[ 9568.748332]        __lock_acquire+0x119a/0x1e10
[ 9568.748344]        lock_acquire+0xc8/0x2f0
[ 9568.748355]        __mutex_lock+0x86/0x740
[ 9568.748365]        io_rsrc_put_work+0x13d/0x1a0
[ 9568.748376]        process_one_work+0x236/0x530
[ 9568.748388]        worker_thread+0x52/0x3b0
[ 9568.748400]        kthread+0x135/0x160
[ 9568.748411]        ret_from_fork+0x1f/0x30
[ 9568.748426] 
               other info that might help us debug this:

[ 9568.748432]  Possible unsafe locking scenario:

[ 9568.748438]        CPU0                    CPU1
[ 9568.748443]        ----                    ----
[ 9568.748448]   lock((work_completion)(&(&ctx->rsrc_put_work)->work));
[ 9568.748459]                                lock(&ctx->uring_lock);
[ 9568.748469]                                lock((work_completion)(&(&ctx->rsrc_put_work)->work));
[ 9568.748479]   lock(&ctx->uring_lock);
[ 9568.748488] 
                *** DEADLOCK ***

[ 9568.748494] 2 locks held by kworker/2:4/2684:
[ 9568.748503]  #0: ffff88810004d938 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530
[ 9568.748531]  #1: ffffc90001c6be70 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530
[ 9568.748558] 
               stack backtrace:
[ 9568.748566] CPU: 2 PID: 2684 Comm: kworker/2:4 Tainted: G           OE     5.14.0-rc3-bluetea-test-00249-gc7d102232649 #5
[ 9568.748580] Hardware name: Acer Aspire ES1-421/OLVIA_BE, BIOS V1.05 07/02/2015
[ 9568.748589] Workqueue: events io_rsrc_put_work
[ 9568.748605] Call Trace:
[ 9568.748620]  dump_stack_lvl+0x6a/0x9a
[ 9568.748639]  check_noncircular+0xfe/0x110
[ 9568.748660]  __lock_acquire+0x119a/0x1e10
[ 9568.748680]  lock_acquire+0xc8/0x2f0
[ 9568.748692]  ? io_rsrc_put_work+0x13d/0x1a0
[ 9568.748712]  __mutex_lock+0x86/0x740
[ 9568.748724]  ? io_rsrc_put_work+0x13d/0x1a0
[ 9568.748738]  ? io_rsrc_put_work+0x13d/0x1a0
[ 9568.748756]  ? io_rsrc_put_work+0x13d/0x1a0
[ 9568.748767]  ? process_one_work+0x1ce/0x530
[ 9568.748779]  io_rsrc_put_work+0x13d/0x1a0
[ 9568.748796]  process_one_work+0x236/0x530
[ 9568.748816]  worker_thread+0x52/0x3b0
[ 9568.748830]  ? process_one_work+0x530/0x530
[ 9568.748845]  kthread+0x135/0x160
[ 9568.748857]  ? set_kthread_struct+0x40/0x40
[ 9568.748872]  ret_from_fork+0x1f/0x30
[ 9579.199425] Running test statx:
[ 9579.337521] Running test sq-full-cpp:

Cc: @isilence

The text was updated successfully, but these errors were encountered:

axboe · 2021-08-09T14:15:36Z

Can you try and add this on top of your kernel?

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 6a8257233061..0c3e90f974e9 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7195,17 +7195,19 @@ static int io_rsrc_ref_quiesce(struct io_rsrc_data *data, struct io_ring_ctx *ct
 		/* kill initial ref, already quiesced if zero */
 		if (atomic_dec_and_test(&data->refs))
 			break;
+		mutex_unlock(&ctx->uring_lock);
 		flush_delayed_work(&ctx->rsrc_put_work);
 		ret = wait_for_completion_interruptible(&data->done);
-		if (!ret)
+		if (!ret) {
+			mutex_lock(&ctx->uring_lock);
 			break;
+		}
 
 		atomic_inc(&data->refs);
 		/* wait for all works potentially completing data->done */
 		flush_delayed_work(&ctx->rsrc_put_work);
 		reinit_completion(&data->done);
 
-		mutex_unlock(&ctx->uring_lock);
 		ret = io_run_task_work_sig();
 		mutex_lock(&ctx->uring_lock);
 	} while (ret >= 0);

ammarfaizi2 · 2021-08-10T01:25:10Z

Can you try and add this on top of your kernel?

Alright, this patch is a cure. All tests passed. Thanks. Does this affect the stable kernel?

Tested-by: Ammar Faizi <[email protected]>

Test output

ammarfaizi2@integral:~/project/now/liburing$ git log -n1
commit 0ea4ccd1c0e4bb05cb4471d64e55d2e7d4e8a24c (HEAD -> master, origin/master, origin/HEAD)
Author: Jens Axboe <[email protected]>
Date:   Sun Aug 8 21:32:29 2021 -0600

    src/queue: don't flush SQ ring for new wait interface
    
    If we have IORING_FEAT_EXT_ARG, then timeouts are done through the
    syscall instead of by posting an internal timeout. This was done
    to be both more efficient, but also to enable multi-threaded use
    the wait side. If we touch the SQ state by flushing it, that isn't
    safe without synchronization.
    
    Fixes: https://github.com/axboe/liburing/issues/402
    Signed-off-by: Jens Axboe <[email protected]>
ammarfaizi2@integral:~/project/now/liburing$ sudo taskset -c 0,1,2,3 make -j4 runtests
make[1]: Entering directory '/home/ammarfaizi2/project/now/liburing/src'
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/home/ammarfaizi2/project/now/liburing/src'
make[1]: Entering directory '/home/ammarfaizi2/project/now/liburing/test'
     CC 232c93d07b74-test
     CC 35fa71a030ca-test
     CC 500f9fbadef8-test
     CC 7ad0e4b2f83c-test
     CC 8a9973408177-test
     CC 917257daa0fe-test
     CC a0908ae19763-test
     CC a4c0b3decb33-test
     CC accept
     CC accept-link
     CC accept-reuse
     CC accept-test
     CC across-fork
     CC splice
     CC b19062a56726-test
     CC b5837bd5311d-test
     CC ce593a6c480a-test
     CC close-opath
     CC connect
     CC cq-full
     CC cq-overflow
     CC cq-peek-batch
     CC cq-ready
     CC cq-size
     CC d4ae271dfaae-test
     CC d77a67ed5f27-test
     CC defer
     CC double-poll-crash
     CC eeed8b54e0df-test
     CC eventfd
     CC eventfd-disable
     CC eventfd-ring
     CC fadvise
     CC fallocate
     CC fc2a85cb02ef-test
     CC file-register
     CC file-update
     CC files-exit-hang-poll
     CC files-exit-hang-timeout
     CC fixed-link
     CC fsync
     CC hardlink
     CC io-cancel
     CC io_uring_enter
     CC io_uring_register
     CC io_uring_setup
     CC iopoll
     CC lfs-openat
     CC lfs-openat-write
     CC link
     CC link-timeout
     CC link_drain
     CC madvise
     CC mkdir
     CC multicqes_drain
     CC nop
     CC nop-all-sizes
     CC open-close
     CC openat2
     CC personality
     CC pipe-eof
     CC pipe-reuse
     CC poll
     CC poll-cancel
     CC poll-cancel-ton
     CC poll-link
     CC poll-many
     CC poll-mshot-update
     CC poll-ring
     CC poll-v-poll
     CC probe
     CC read-write
     CC register-restrictions
     CC rename
     CC ring-leak
     CC ring-leak2
     CC rw_merge_test
     CC self
     CC send_recv
     CC send_recvmsg
     CC shared-wq
     CC short-read
     CC shutdown
     CC sigfd-deadlock
     CC socket-rw
     CC socket-rw-eagain
     CC sq-full
     CC sq-poll-dup
     CC sq-poll-kthread
     CC sq-poll-share
     CC sqpoll-disable-exit
     CC sqpoll-exit-hang
     CC sqpoll-cancel-hang
     CC sqpoll-sleep
     CC sq-space_left
     CC stdout
     CC submit-reuse
     CC symlink
     CC teardowns
     CC thread-exit
     CC timeout
     CC timeout-new
     CC timeout-overflow
     CC unlink
     CC wakeup-hang
     CC sendmsg_fs_cve
     CC rsrc_tags
     CC statx
     CXX sq-full-cpp
make[1]: Leaving directory '/home/ammarfaizi2/project/now/liburing/test'
make[1]: Entering directory '/home/ammarfaizi2/project/now/liburing/examples'
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/home/ammarfaizi2/project/now/liburing/examples'
make[1]: Entering directory '/home/ammarfaizi2/project/now/liburing/test'
Running test 232c93d07b74-test:
Running test 35fa71a030ca-test:
Test 35fa71a030ca-test timed out (may not be a failure)
Running test 500f9fbadef8-test:
Running test 7ad0e4b2f83c-test:
Running test 8a9973408177-test:
Running test 917257daa0fe-test:
Running test a0908ae19763-test:
Running test a4c0b3decb33-test:
Running test accept:
Running test accept-link:
Running test accept-reuse:
Running test accept-test:
Running test across-fork:
Running test splice:
Running test b19062a56726-test:
Running test b5837bd5311d-test:
Running test ce593a6c480a-test:
Running test close-opath:
Running test connect:
Running test cq-full:
Running test cq-overflow:
Running test cq-peek-batch:
Running test cq-ready:
Running test cq-size:
Running test d4ae271dfaae-test:
Running test d77a67ed5f27-test:
Running test defer:
Running test double-poll-crash:
Running test eeed8b54e0df-test:
Running test eventfd:
Running test eventfd-disable:
Running test eventfd-ring:
Running test fadvise:
Running test fallocate:
Running test fc2a85cb02ef-test:
Test needs failslab/fail_futex/fail_page_alloc enabled, skipped
Running test file-register:
Running test file-update:
Running test files-exit-hang-poll:
Running test files-exit-hang-timeout:
Running test fixed-link:
Running test fsync:
Running test hardlink:
linkat not supported, skipping
Running test io-cancel:
Running test io_uring_enter:
io_uring_enter(3, 1, 0, 4294967295, (nil))
io_uring_enter(-1, 0, 0, 0, (nil))
io_uring_enter(0, 0, 0, 0, (nil))
io_uring_enter(3, 0, 0, 0, (nil))
Allocating 4096 sqes
Submitting 4096 I/Os
Done
Waiting for 4096 events
Reaping 4096 I/Os
Submitting invalid sqe index.
PASS
Running test io_uring_register:
RELIMIT_MEMLOCK: 1210114048 (1210114048)
io_uring_register(-1, 0, (nil), 0)
io_uring_register(3, 0, (nil), 0)
io_uring_register(4, 4294967295, (nil), 0)
io_uring_register(4, 0, 0x7fff62104b50, 1)
io_uring_register(4, 0, 0x7fff62104b50, 1)
io_uring_register(4, 0, 0x7fff62104b50, 1)
Unable to map a huge page.  Try increasing /proc/sys/vm/nr_hugepages by at least 1.
Skipping the hugepage test
reserve file-backed buffers
io_uring_register(4, 0, 0x7fff62104b50, 1)
io_uring_register(4, 0, 0x7f0ddab0d010, 1000000)
io_uring_register(4, 0, 0x7f0ddab0d010, 1024)
Not enough memory for this test, skipping
io_uring_submit:
	opcode: 6
	flags:  0x00000000
	fd:     4
	poll_events: 0x00000005
io_uring_register(4, 2, 0x7fff62104a9c, 1)
PASS
Running test io_uring_setup:
io_uring_setup(0, 0x7ffda11ffbb0), flags: none, feat: none, resv: 0x00000000 0x00000000 0x00000000, sq_thread_cpu: 0
io_uring_setup(1, (nil)), flags: none, feat: none, resv: , sq_thread_cpu: 0
io_uring_setup(1, 0x7ffda11ffbb0), flags: none, feat: none, resv: 0x00000001 0x00000001 0x00000001, sq_thread_cpu: 0
io_uring_setup(1, 0x7ffda11ffbb0), flags: 0xffffffff, feat: none, resv: 0x00000000 0x00000000 0x00000000, sq_thread_cpu: 0
io_uring_setup(1, 0x7ffda11ffbb0), flags: IORING_SETUP_SQ_AFF, feat: none, resv: 0x00000000 0x00000000 0x00000000, sq_thread_cpu: 0
io_uring_setup(1, 0x7ffda11ffbb0), flags: IORING_SETUP_SQPOLL|IORING_SETUP_SQ_AFF, feat: none, resv: 0x00000000 0x00000000 0x00000000, sq_thread_cpu: 4
PASS
Running test iopoll:
Running test lfs-openat:
Running test lfs-openat-write:
Running test link:
Running test link-timeout:
Running test link_drain:
Running test madvise:
Suspicious timings (62 > 38)
Running test mkdir:
mkdirat not supported, skipping
Running test multicqes_drain:
Running test nop:
Running test nop-all-sizes:
Running test open-close:
Running test openat2:
Running test personality:
Running test pipe-eof:
Running test pipe-reuse:
Running test poll:
Running test poll-cancel:
Running test poll-cancel-ton:
Running test poll-link:
Running test poll-many:
Running test poll-mshot-update:
Running test poll-ring:
Running test poll-v-poll:
Running test probe:
Running test read-write:
Running test register-restrictions:
Running test rename:
Running test ring-leak:
Running test ring-leak2:
Running test rw_merge_test:
Running test self:
Running test send_recv:
Running test send_recvmsg:
Running test shared-wq:
Running test short-read:
Running test shutdown:
Running test sigfd-deadlock:
Running test socket-rw:
Running test socket-rw-eagain:
Running test sq-full:
Running test sq-poll-dup:
Running test sq-poll-kthread:
Running test sq-poll-share:
Running test sqpoll-disable-exit:
Running test sqpoll-exit-hang:
Running test sqpoll-cancel-hang:
Running test sqpoll-sleep:
Running test sq-space_left:
Running test stdout:
This is a pipe test
This is a fixed pipe test
Running test submit-reuse:
Running test symlink:
symlinkat not supported, skipping
Running test teardowns:
Running test thread-exit:
Running test timeout:
Running test timeout-new:
Running test timeout-overflow:
Skipping
Running test unlink:
Running test wakeup-hang:
Running test sendmsg_fs_cve:
Running test rsrc_tags:
Running test statx:
Running test sq-full-cpp:
All tests passed
make[1]: Leaving directory '/home/ammarfaizi2/project/now/liburing/test'
ammarfaizi2@integral:~/project/now/liburing$

Kernel Log

<3>[  309.627538][    T5] Bluetooth: hci0: command 0x1003 tx timeout
<3>[  309.638176][ T2475] Bluetooth: hci0: unexpected event for opcode 0x1003
<6>[  311.483479][ T2759] IPv6: ADDRCONF(NETDEV_CHANGE): wlp2s0: link becomes ready
<6>[  314.627403][ T3050] io scheduler bfq registered
<6>[  318.646697][ T3247] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
<6>[  346.203477][ T3563] Bluetooth: RFCOMM TTY layer initialized
<6>[  346.203634][ T3563] Bluetooth: RFCOMM socket layer initialized
<6>[  346.203751][ T3563] Bluetooth: RFCOMM ver 1.11
<4>[  347.701283][   T44] kauditd_printk_skb: 29 callbacks suppressed
<5>[  347.701296][   T44] audit: type=1400 audit(1628557064.993:41): apparmor="DENIED" operation="capable" profile="libvirtd" pid=2908 comm="daemon-init" capability=17  capname="sys_rawio"
<5>[  359.290314][   T44] audit: type=1400 audit(1628557076.582:42): apparmor="DENIED" operation="capable" profile="libvirtd" pid=2908 comm="daemon-init" capability=17  capname="sys_rawio"
<6>[  381.441045][ T3636] loop8: detected capacity change from 0 to 8
<12>[  420.100770][ T3786] Running test 232c93d07b74-test:
<12>[  424.869089][ T3805] Running test 35fa71a030ca-test:
<12>[  484.808620][T14563] Running test 500f9fbadef8-test:
<12>[  490.415422][T14975] Running test 7ad0e4b2f83c-test:
<12>[  491.705790][T14986] Running test 8a9973408177-test:
<12>[  493.163207][T14998] Running test 917257daa0fe-test:
<12>[  493.461249][T15010] Running test a0908ae19763-test:
<12>[  493.740508][T15021] Running test a4c0b3decb33-test:
<12>[  536.701528][T25041] Running test accept:
<12>[  540.261432][T25053] Running test accept-link:
<12>[  540.769375][T25067] Running test accept-reuse:
<12>[  541.069828][T25078] Running test accept-test:
<12>[  541.368776][T25089] Running test across-fork:
<12>[  541.775466][T25103] Running test splice:
<12>[  542.095216][T25117] Running test b19062a56726-test:
<12>[  542.396804][T25128] Running test b5837bd5311d-test:
<12>[  542.700743][T25139] Running test ce593a6c480a-test:
<12>[  544.106692][T25151] Running test close-opath:
<12>[  544.473766][T25162] Running test connect:
<6>[  544.606850][    C3] TCP: request_sock_TCP: Possible SYN flooding on port 32022. Sending cookies.  Check SNMP counters.
<12>[  544.820126][T25173] Running test cq-full:
<12>[  545.108846][T25184] Running test cq-overflow:
<12>[  556.738449][T25196] Running test cq-peek-batch:
<12>[  557.077184][T25207] Running test cq-ready:
<12>[  557.387326][T25218] Running test cq-size:
<12>[  557.706220][T25229] Running test d4ae271dfaae-test:
<12>[  558.078277][T25241] Running test d77a67ed5f27-test:
<12>[  558.514842][T25253] Running test defer:
<12>[  561.822918][T25265] Running test double-poll-crash:
<12>[  562.201123][T25282] Running test eeed8b54e0df-test:
<12>[  562.596434][T25293] Running test eventfd:
<12>[  562.923280][T25304] Running test eventfd-disable:
<12>[  563.218718][T25315] Running test eventfd-ring:
<12>[  563.689836][T25327] Running test fadvise:
<12>[  564.216050][T25340] Running test fallocate:
<12>[  564.524538][T25352] Running test fc2a85cb02ef-test:
<12>[  564.812246][T25363] Running test file-register:
<12>[  584.439917][T25379] Running test file-update:
<12>[  587.998315][T25391] Running test files-exit-hang-poll:
<12>[  589.287710][T25402] Running test files-exit-hang-timeout:
<12>[  590.585481][T25413] Running test fixed-link:
<12>[  590.884950][T25425] Running test fsync:
<12>[  591.346689][T25438] Running test hardlink:
<12>[  591.667178][T25449] Running test io-cancel:
<12>[  594.542376][T25468] Running test io_uring_enter:
<12>[  595.220557][T25480] Running test io_uring_register:
<12>[  598.023906][T25493] Running test io_uring_setup:
<12>[  598.311982][T25504] Running test iopoll:
<12>[  612.402572][T25521] Running test lfs-openat:
<12>[  612.707410][T25535] Running test lfs-openat-write:
<12>[  613.017022][T25547] Running test link:
<12>[  613.339886][T25558] Running test link-timeout:
<12>[  615.141145][T25572] Running test link_drain:
<12>[  618.781080][T25587] Running test madvise:
<12>[  619.517360][T25623] Running test mkdir:
<12>[  619.895954][T25634] Running test multicqes_drain:
<12>[  665.327107][T25649] Running test nop:
<12>[  667.299958][T25660] Running test nop-all-sizes:
<12>[  667.762178][T25671] Running test open-close:
<12>[  668.199852][T25682] Running test openat2:
<12>[  668.626481][T25693] Running test personality:
<12>[  668.924627][T25704] Running test pipe-eof:
<12>[  669.231073][T25717] Running test pipe-reuse:
<12>[  669.527250][T25729] Running test poll:
<12>[  669.840106][T25741] Running test poll-cancel:
<12>[  670.118957][T25752] Running test poll-cancel-ton:
<12>[  672.665516][T25763] Running test poll-link:
<12>[  673.166575][T25777] Running test poll-many:
<12>[  721.799874][T25788] Running test poll-mshot-update:
<12>[  756.551995][T26799] Running test poll-ring:
<12>[  756.878452][T26810] Running test poll-v-poll:
<12>[  757.591800][T26835] Running test probe:
<12>[  757.932940][T26846] Running test read-write:
<12>[  806.131239][T26997] Running test register-restrictions:
<12>[  806.709986][T27010] Running test rename:
<12>[  807.064036][T27022] Running test ring-leak:
<12>[  807.369705][T27036] Running test ring-leak2:
<12>[  808.667244][T27051] Running test rw_merge_test:
<12>[  808.954251][T27064] Running test self:
<12>[  809.260712][T27076] Running test send_recv:
<12>[  809.575584][T27092] Running test send_recvmsg:
<12>[  809.925982][T27107] Running test shared-wq:
<12>[  810.254864][T27118] Running test short-read:
<12>[  810.609793][T27129] Running test shutdown:
<12>[  810.911959][T27141] Running test sigfd-deadlock:
<12>[  811.201698][T27152] Running test socket-rw:
<12>[  811.514685][T27163] Running test socket-rw-eagain:
<12>[  811.795975][T27174] Running test sq-full:
<12>[  812.101141][T27185] Running test sq-poll-dup:
<12>[  822.053145][T27199] Running test sq-poll-kthread:
<12>[  824.849719][T27222] Running test sq-poll-share:
<12>[  846.004185][T27234] Running test sqpoll-disable-exit:
<12>[  847.467230][T27448] Running test sqpoll-exit-hang:
<12>[  848.759199][T27460] Running test sqpoll-cancel-hang:
<12>[  850.059074][T27473] Running test sqpoll-sleep:
<12>[  850.345403][T27485] Running test sq-space_left:
<12>[  850.668587][T27496] Running test stdout:
<12>[  853.228854][T27509] Running test submit-reuse:
<12>[  871.202048][T27588] Running test symlink:
<12>[  871.516765][T27599] Running test teardowns:
<12>[  872.263552][T27652] Running test thread-exit:
<12>[  872.605036][T27679] Running test timeout:
<12>[  878.922230][T27693] Running test timeout-new:
<12>[  881.638133][T27707] Running test timeout-overflow:
<12>[  881.924161][T27718] Running test unlink:
<12>[  882.236647][T27730] Running test wakeup-hang:
<12>[  884.539288][T27745] Running test sendmsg_fs_cve:
<12>[  884.851683][T27758] Running test rsrc_tags:
<12>[  907.323541][T27773] Running test statx:
<12>[  907.695528][T27785] Running test sq-full-cpp:
ammarfaizi2@integral:~/project/now/liburing$

axboe · 2021-08-10T02:00:03Z

It does affect stable, at least recent stable. I'll mark it for backport. Thanks for reporting, and testing!

Ammar reports that he's seeing a lockdep splat on running test/rsrc_tags from the regression suite: ====================================================== WARNING: possible circular locking dependency detected 5.14.0-rc3-bluetea-test-00249-gc7d102232649 #5 Tainted: G OE ------------------------------------------------------ kworker/2:4/2684 is trying to acquire lock: ffff88814bb1c0a8 (&ctx->uring_lock){+.+.}-{3:3}, at: io_rsrc_put_work+0x13d/0x1a0 but task is already holding lock: ffffc90001c6be70 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}: __flush_work+0x31b/0x490 io_rsrc_ref_quiesce.part.0.constprop.0+0x35/0xb0 __do_sys_io_uring_register+0x45b/0x1060 do_syscall_64+0x35/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #0 (&ctx->uring_lock){+.+.}-{3:3}: __lock_acquire+0x119a/0x1e10 lock_acquire+0xc8/0x2f0 __mutex_lock+0x86/0x740 io_rsrc_put_work+0x13d/0x1a0 process_one_work+0x236/0x530 worker_thread+0x52/0x3b0 kthread+0x135/0x160 ret_from_fork+0x1f/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((work_completion)(&(&ctx->rsrc_put_work)->work)); lock(&ctx->uring_lock); lock((work_completion)(&(&ctx->rsrc_put_work)->work)); lock(&ctx->uring_lock); *** DEADLOCK *** 2 locks held by kworker/2:4/2684: #0: ffff88810004d938 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530 #1: ffffc90001c6be70 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530 stack backtrace: CPU: 2 PID: 2684 Comm: kworker/2:4 Tainted: G OE 5.14.0-rc3-bluetea-test-00249-gc7d102232649 #5 Hardware name: Acer Aspire ES1-421/OLVIA_BE, BIOS V1.05 07/02/2015 Workqueue: events io_rsrc_put_work Call Trace: dump_stack_lvl+0x6a/0x9a check_noncircular+0xfe/0x110 __lock_acquire+0x119a/0x1e10 lock_acquire+0xc8/0x2f0 ? io_rsrc_put_work+0x13d/0x1a0 __mutex_lock+0x86/0x740 ? io_rsrc_put_work+0x13d/0x1a0 ? io_rsrc_put_work+0x13d/0x1a0 ? io_rsrc_put_work+0x13d/0x1a0 ? process_one_work+0x1ce/0x530 io_rsrc_put_work+0x13d/0x1a0 process_one_work+0x236/0x530 worker_thread+0x52/0x3b0 ? process_one_work+0x530/0x530 kthread+0x135/0x160 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x1f/0x30 which is due to holding the ctx->uring_lock when flushing existing pending work, while the pending work flushing may need to grab the uring lock if we're using IOPOLL. Fix this by dropping the uring_lock a bit earlier as part of the flush. Link: axboe/liburing#404 Signed-off-by: Jens Axboe <[email protected]>

Ammar reports that he's seeing a lockdep splat on running test/rsrc_tags from the regression suite: ====================================================== WARNING: possible circular locking dependency detected 5.14.0-rc3-bluetea-test-00249-gc7d102232649 #5 Tainted: G OE ------------------------------------------------------ kworker/2:4/2684 is trying to acquire lock: ffff88814bb1c0a8 (&ctx->uring_lock){+.+.}-{3:3}, at: io_rsrc_put_work+0x13d/0x1a0 but task is already holding lock: ffffc90001c6be70 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}: __flush_work+0x31b/0x490 io_rsrc_ref_quiesce.part.0.constprop.0+0x35/0xb0 __do_sys_io_uring_register+0x45b/0x1060 do_syscall_64+0x35/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #0 (&ctx->uring_lock){+.+.}-{3:3}: __lock_acquire+0x119a/0x1e10 lock_acquire+0xc8/0x2f0 __mutex_lock+0x86/0x740 io_rsrc_put_work+0x13d/0x1a0 process_one_work+0x236/0x530 worker_thread+0x52/0x3b0 kthread+0x135/0x160 ret_from_fork+0x1f/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((work_completion)(&(&ctx->rsrc_put_work)->work)); lock(&ctx->uring_lock); lock((work_completion)(&(&ctx->rsrc_put_work)->work)); lock(&ctx->uring_lock); *** DEADLOCK *** 2 locks held by kworker/2:4/2684: #0: ffff88810004d938 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530 #1: ffffc90001c6be70 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530 stack backtrace: CPU: 2 PID: 2684 Comm: kworker/2:4 Tainted: G OE 5.14.0-rc3-bluetea-test-00249-gc7d102232649 #5 Hardware name: Acer Aspire ES1-421/OLVIA_BE, BIOS V1.05 07/02/2015 Workqueue: events io_rsrc_put_work Call Trace: dump_stack_lvl+0x6a/0x9a check_noncircular+0xfe/0x110 __lock_acquire+0x119a/0x1e10 lock_acquire+0xc8/0x2f0 ? io_rsrc_put_work+0x13d/0x1a0 __mutex_lock+0x86/0x740 ? io_rsrc_put_work+0x13d/0x1a0 ? io_rsrc_put_work+0x13d/0x1a0 ? io_rsrc_put_work+0x13d/0x1a0 ? process_one_work+0x1ce/0x530 io_rsrc_put_work+0x13d/0x1a0 process_one_work+0x236/0x530 worker_thread+0x52/0x3b0 ? process_one_work+0x530/0x530 kthread+0x135/0x160 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x1f/0x30 which is due to holding the ctx->uring_lock when flushing existing pending work, while the pending work flushing may need to grab the uring lock if we're using IOPOLL. Fix this by dropping the uring_lock a bit earlier as part of the flush. Cc: [email protected] Link: axboe/liburing#404 Tested-by: Ammar Faizi <[email protected]> Signed-off-by: Jens Axboe <[email protected]>

Ammar reports that he's seeing a lockdep splat on running test/rsrc_tags from the regression suite: ====================================================== WARNING: possible circular locking dependency detected 5.14.0-rc3-bluetea-test-00249-gc7d102232649 csi2115-super#5 Tainted: G OE ------------------------------------------------------ kworker/2:4/2684 is trying to acquire lock: ffff88814bb1c0a8 (&ctx->uring_lock){+.+.}-{3:3}, at: io_rsrc_put_work+0x13d/0x1a0 but task is already holding lock: ffffc90001c6be70 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> csi2115-super#1 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}: __flush_work+0x31b/0x490 io_rsrc_ref_quiesce.part.0.constprop.0+0x35/0xb0 __do_sys_io_uring_register+0x45b/0x1060 do_syscall_64+0x35/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #0 (&ctx->uring_lock){+.+.}-{3:3}: __lock_acquire+0x119a/0x1e10 lock_acquire+0xc8/0x2f0 __mutex_lock+0x86/0x740 io_rsrc_put_work+0x13d/0x1a0 process_one_work+0x236/0x530 worker_thread+0x52/0x3b0 kthread+0x135/0x160 ret_from_fork+0x1f/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((work_completion)(&(&ctx->rsrc_put_work)->work)); lock(&ctx->uring_lock); lock((work_completion)(&(&ctx->rsrc_put_work)->work)); lock(&ctx->uring_lock); *** DEADLOCK *** 2 locks held by kworker/2:4/2684: #0: ffff88810004d938 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530 csi2115-super#1: ffffc90001c6be70 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530 stack backtrace: CPU: 2 PID: 2684 Comm: kworker/2:4 Tainted: G OE 5.14.0-rc3-bluetea-test-00249-gc7d102232649 csi2115-super#5 Hardware name: Acer Aspire ES1-421/OLVIA_BE, BIOS V1.05 07/02/2015 Workqueue: events io_rsrc_put_work Call Trace: dump_stack_lvl+0x6a/0x9a check_noncircular+0xfe/0x110 __lock_acquire+0x119a/0x1e10 lock_acquire+0xc8/0x2f0 ? io_rsrc_put_work+0x13d/0x1a0 __mutex_lock+0x86/0x740 ? io_rsrc_put_work+0x13d/0x1a0 ? io_rsrc_put_work+0x13d/0x1a0 ? io_rsrc_put_work+0x13d/0x1a0 ? process_one_work+0x1ce/0x530 io_rsrc_put_work+0x13d/0x1a0 process_one_work+0x236/0x530 worker_thread+0x52/0x3b0 ? process_one_work+0x530/0x530 kthread+0x135/0x160 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x1f/0x30 which is due to holding the ctx->uring_lock when flushing existing pending work, while the pending work flushing may need to grab the uring lock if we're using IOPOLL. Fix this by dropping the uring_lock a bit earlier as part of the flush. Cc: [email protected] Link: axboe/liburing#404 Tested-by: Ammar Faizi <[email protected]> Signed-off-by: Jens Axboe <[email protected]>

Ammar reports that he's seeing a lockdep splat on running test/rsrc_tags from the regression suite: ====================================================== WARNING: possible circular locking dependency detected 5.14.0-rc3-bluetea-test-00249-gc7d102232649 #5 Tainted: G OE ------------------------------------------------------ kworker/2:4/2684 is trying to acquire lock: ffff88814bb1c0a8 (&ctx->uring_lock){+.+.}-{3:3}, at: io_rsrc_put_work+0x13d/0x1a0 but task is already holding lock: ffffc90001c6be70 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}: __flush_work+0x31b/0x490 io_rsrc_ref_quiesce.part.0.constprop.0+0x35/0xb0 __do_sys_io_uring_register+0x45b/0x1060 do_syscall_64+0x35/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #0 (&ctx->uring_lock){+.+.}-{3:3}: __lock_acquire+0x119a/0x1e10 lock_acquire+0xc8/0x2f0 __mutex_lock+0x86/0x740 io_rsrc_put_work+0x13d/0x1a0 process_one_work+0x236/0x530 worker_thread+0x52/0x3b0 kthread+0x135/0x160 ret_from_fork+0x1f/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((work_completion)(&(&ctx->rsrc_put_work)->work)); lock(&ctx->uring_lock); lock((work_completion)(&(&ctx->rsrc_put_work)->work)); lock(&ctx->uring_lock); *** DEADLOCK *** 2 locks held by kworker/2:4/2684: #0: ffff88810004d938 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530 #1: ffffc90001c6be70 ((work_completion)(&(&ctx->rsrc_put_work)->work)){+.+.}-{0:0}, at: process_one_work+0x1bc/0x530 stack backtrace: CPU: 2 PID: 2684 Comm: kworker/2:4 Tainted: G OE 5.14.0-rc3-bluetea-test-00249-gc7d102232649 #5 Hardware name: Acer Aspire ES1-421/OLVIA_BE, BIOS V1.05 07/02/2015 Workqueue: events io_rsrc_put_work Call Trace: dump_stack_lvl+0x6a/0x9a check_noncircular+0xfe/0x110 __lock_acquire+0x119a/0x1e10 lock_acquire+0xc8/0x2f0 ? io_rsrc_put_work+0x13d/0x1a0 __mutex_lock+0x86/0x740 ? io_rsrc_put_work+0x13d/0x1a0 ? io_rsrc_put_work+0x13d/0x1a0 ? io_rsrc_put_work+0x13d/0x1a0 ? process_one_work+0x1ce/0x530 io_rsrc_put_work+0x13d/0x1a0 process_one_work+0x236/0x530 worker_thread+0x52/0x3b0 ? process_one_work+0x530/0x530 kthread+0x135/0x160 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x1f/0x30 which is due to holding the ctx->uring_lock when flushing existing pending work, while the pending work flushing may need to grab the uring lock if we're using IOPOLL. Fix this by dropping the uring_lock a bit earlier as part of the flush. Cc: [email protected] Link: axboe/liburing#404 Tested-by: Ammar Faizi <[email protected]> Signed-off-by: Jens Axboe <[email protected]>

axboe closed this as completed Aug 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WARNING: possible circular locking dependency detected (io_rsrc_put_work+0x13d/0x1a0) #404

WARNING: possible circular locking dependency detected (io_rsrc_put_work+0x13d/0x1a0) #404

ammarfaizi2 commented Aug 9, 2021

axboe commented Aug 9, 2021

ammarfaizi2 commented Aug 10, 2021 •

edited

Loading

axboe commented Aug 10, 2021

WARNING: possible circular locking dependency detected (io_rsrc_put_work+0x13d/0x1a0) #404

WARNING: possible circular locking dependency detected (io_rsrc_put_work+0x13d/0x1a0) #404

Comments

ammarfaizi2 commented Aug 9, 2021

axboe commented Aug 9, 2021

ammarfaizi2 commented Aug 10, 2021 • edited Loading

Test output

Kernel Log

axboe commented Aug 10, 2021

ammarfaizi2 commented Aug 10, 2021 •

edited

Loading