You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#0 0x00000000003e0692 in cli_hlt () at /home/tgrabiec/src/osv/arch/x64/processor.hh:242
#1 halt_no_interrupts () at /home/tgrabiec/src/osv/arch/x64/arch.hh:48
#2 osv::halt () at /home/tgrabiec/src/osv/core/power.cc:34
#3 0x00000000002232a5 in abort (fmt=fmt@entry=0x5e7140 "Assertion failed: %s (%s: %s: %d)\n") at /home/tgrabiec/src/osv/runtime.cc:143
#4 0x00000000002232e9 in __assert_fail (expr=<optimized out>, file=<optimized out>, line=<optimized out>, func=<optimized out>) at /home/tgrabiec/src/osv/runtime.cc:149
#5 0x0000000000270470 in tcp_do_segment (m=m@entry=0xffffa0017f20e500, th=th@entry=0xffff800140fef02e, so=so@entry=0xffffa00109e84a00, tp=tp@entry=0xffffa00130062800, drop_hdrlen=0x42, tlen=tlen@entry=0x3d, iptos=iptos@entry=0x0, ti_locked=0x2, ti_locked@entry=0x1, want_close=@0xffff80010059deb0: 0x0) at /home/tgrabiec/src/osv/bsd/sys/netinet/tcp_input.cc:1075
#6 0x0000000000271f4f in tcp_net_channel_packet (m=0xffffa0017f20e500, tp=0xffffa00130062800) at /home/tgrabiec/src/osv/bsd/sys/netinet/tcp_input.cc:3210
#7 operator() (m=0xffffa0017f20e500, __closure=<optimized out>) at /home/tgrabiec/src/osv/bsd/sys/netinet/tcp_input.cc:3229
#8 std::_Function_handler<void(mbuf*), tcp_setup_net_channel(tcpcb*, ifnet*)::__lambda7>::_M_invoke(const std::_Any_data &, mbuf *) (__functor=..., __args#0=0xffffa0017f20e500) at /home/tgrabiec/src/osv/external/x64/gcc.bin/usr/include/c++/4.8.2/functional:2071
#9 0x00000000003e5c5f in operator() (__args#0=<optimized out>, this=0xffff90012ff0b000) at /home/tgrabiec/src/osv/external/x64/gcc.bin/usr/include/c++/4.8.2/functional:2464
#10 net_channel::process_queue (this=0xffff90012ff0b000) at /home/tgrabiec/src/osv/core/net_channel.cc:37
#11 0x000000000027ba00 in tcp_timer_rexmt (timer=..., tp=0xffffa00130062800) at /home/tgrabiec/src/osv/bsd/sys/netinet/tcp_timer.cc:478
#12 0x00000000003ea582 in async::timer_task::fire (this=this@entry=0xffffa00103132a10, task=...) at /home/tgrabiec/src/osv/core/async.cc:360
#13 0x00000000003eb36b in fire (task=..., this=0xffff800100588040) at /home/tgrabiec/src/osv/core/async.cc:227
#14 async::async_worker::run (this=0xffff800100588040) at /home/tgrabiec/src/osv/core/async.cc:175
#15 0x00000000003caa0b in main (this=0xffff800100588740) at /home/tgrabiec/src/osv/core/sched.cc:935
#16 sched::thread_main_c (t=0xffff800100588740) at /home/tgrabiec/src/osv/arch/x64/arch-switch.hh:137
#17 0x000000000037a616 in thread_main () at /home/tgrabiec/src/osv/arch/x64/entry.S:113
The inp is dropped at this point:
gdb$ p inp->inp_flags
$10 = 0x4000000
The problem is that the the retransmission timer fires after socket got closed (this is ok) and there are unprocessed packets in the net channel. They go fast path into tcp_do_segment. Looking at tcp_input(), it sends RST before calling tcp_do_segment when socket is in closed state. I think we should replicate this in our net channel fast path too. I will try to come up with a patch for that.
The text was updated successfully, but these errors were encountered:
@slivne@tgrabiec Did this get resolved in 0.14? I think an elusive bug we've been encountering is caused by this or something that closely resembles it 😦
Unfortunately, if it had been resolved by a specific commit, this commit would have been mentioned here, and the issue would have been automatically closed. So I'm afraid that unless we were lucky and some other fix fixed it, this bug is still open :-(
What does fixing it look like? Are all you folks busy with the ScyllaDB release or is this sort of thing still a priority? I have minimal knowledge to help technically here, but perhaps I can help in other ways? It's unclear from the post if @tgrabiec has a simple repro of the issue, but if he doesn't, perhaps I can get one.
Is there any insight if the change is better addressed at the RCU related changes (introduced in #383 - also referenced in #378), or is the advice above sound - simply replicating the RST behavior in the net channel fast path?
I got this after running workloadf on my laptop.
The
inp
is dropped at this point:The problem is that the the retransmission timer fires after socket got closed (this is ok) and there are unprocessed packets in the net channel. They go fast path into
tcp_do_segment
. Looking attcp_input()
, it sendsRST
before callingtcp_do_segment
when socket is in closed state. I think we should replicate this in our net channel fast path too. I will try to come up with a patch for that.The text was updated successfully, but these errors were encountered: