Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sambamba markdup deadlock #189

Closed
tetron opened this issue Feb 11, 2016 · 27 comments
Closed

sambamba markdup deadlock #189

tetron opened this issue Feb 11, 2016 · 27 comments

Comments

@tetron
Copy link

tetron commented Feb 11, 2016

We are experiencing deadlock in sambamba markdup v0.5.8. When it hits the deadlock, the process is completely stuck and never makes progress. We have experienced it with both -t 1 and -t 16 although it seems to be more prevalent with -t 16.

$ sambamba_v0.5.8 markdup --overflow-list-size=500000 -t 1 sorted.bam dedup.bam
finding positions of the duplicate reads in the file...

After some indeterminate amount of time (could be 10 minutes, could be 40 minutes) the CPU usage drops to zero and it never reports anything past the first log message. Using strace indicates that all threads are waiting on a mutex:

# strace -f -p 40280
Process 40280 attached with 17 threads
[pid 40296] rt_sigsuspend(~[USR2 RTMIN RT_1] <unfinished ...>
[pid 40295] rt_sigsuspend(~[USR2 RTMIN RT_1] <unfinished ...>
[pid 40294] rt_sigsuspend(~[USR2 RTMIN RT_1] <unfinished ...>
[pid 40293] rt_sigsuspend(~[USR2 RTMIN RT_1] <unfinished ...>
[pid 40292] rt_sigsuspend(~[USR2 RTMIN RT_1] <unfinished ...>
[pid 40291] rt_sigsuspend(~[USR2 RTMIN RT_1] <unfinished ...>
[pid 40290] futex(0x7f2f42b2049c, FUTEX_WAIT_PRIVATE, 21719597, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 40290] futex(0x7f2f42b2049c, FUTEX_WAIT_PRIVATE, 21719611, NULL <unfinished ...>
[pid 40289] futex(0x7f2f42b2049c, FUTEX_WAIT_PRIVATE, 21719603, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 40289] futex(0x7f2f42b2049c, FUTEX_WAIT_PRIVATE, 21719611, NULL <unfinished ...>
[pid 40288] futex(0x7f2f42b2049c, FUTEX_WAIT_PRIVATE, 21719607, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 40288] futex(0x7f2f42b2049c, FUTEX_WAIT_PRIVATE, 21719611, NULL <unfinished ...>
[pid 40287] futex(0x7f2f42b2049c, FUTEX_WAIT_PRIVATE, 21719599, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 40287] futex(0x7f2f42b2049c, FUTEX_WAIT_PRIVATE, 21719611, NULL <unfinished ...>
[pid 40286] futex(0x7f2f42b2049c, FUTEX_WAIT_PRIVATE, 21719611, NULL <unfinished ...>
[pid 40285] futex(0x7f2f42b2049c, FUTEX_WAIT_PRIVATE, 21719596, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 40285] futex(0x7f2f42b2049c, FUTEX_WAIT_PRIVATE, 21719611, NULL <unfinished ...>
[pid 40284] futex(0x7f2f42b2049c, FUTEX_WAIT_PRIVATE, 21719605, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 40284] futex(0x7f2f42b2049c, FUTEX_WAIT_PRIVATE, 21719611, NULL <unfinished ...>
[pid 40283] futex(0x7f2f42b2049c, FUTEX_WAIT_PRIVATE, 21719606, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 40283] futex(0x7f2f42b2049c, FUTEX_WAIT_PRIVATE, 21719611, NULL <unfinished ...>
[pid 40282] futex(0x7f2f42b2049c, FUTEX_WAIT_PRIVATE, 21719608, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 40282] futex(0x7f2f42b2049c, FUTEX_WAIT_PRIVATE, 21719611, NULL <unfinished ...>
[pid 40281] futex(0x8371f0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 40280] futex(0x7f2f44f8969c, FUTEX_WAIT_PRIVATE, 773943, NULL
@tetron
Copy link
Author

tetron commented Feb 11, 2016

Actually, it appears to be ignoring "-t 1" and still creating 16 threads. The "-t" option only seems to have an effect if it is greater than the number of cores on the machine (so "-t 32" results in 32 threads but "-t 1" still gets 16 threads).

@tetron
Copy link
Author

tetron commented Feb 11, 2016

From top, running with -t 1

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                                    
58772 1000      20   0  647576 405644   3116 S 65.2  0.7   0:12.04 sambamba_v0.5.8                                                            
58771 1000      20   0  647576 405644   3116 R 52.2  0.7   0:09.27 sambamba_v0.5.8                                                            
58778 1000      20   0  647576 405644   3116 S  6.0  0.7   0:00.71 sambamba_v0.5.8                                                            
58784 1000      20   0  647576 405644   3116 S  6.0  0.7   0:00.72 sambamba_v0.5.8                                                            
58785 1000      20   0  647576 405644   3116 S  5.7  0.7   0:00.64 sambamba_v0.5.8                                                            
58776 1000      20   0  647576 405644   3116 S  4.3  0.7   0:00.56 sambamba_v0.5.8                                                            
58773 1000      20   0  647576 405644   3116 S  4.0  0.7   0:00.58 sambamba_v0.5.8                                                            
58775 1000      20   0  647576 405644   3116 S  4.0  0.7   0:00.63 sambamba_v0.5.8                                                            
58779 1000      20   0  647576 405644   3116 S  4.0  0.7   0:00.62 sambamba_v0.5.8                                                            
58777 1000      20   0  647576 405644   3116 S  3.3  0.7   0:00.54 sambamba_v0.5.8                                                            
58787 1000      20   0  647576 405644   3116 S  3.3  0.7   0:00.57 sambamba_v0.5.8                                                            
58783 1000      20   0  647576 405644   3116 S  3.0  0.7   0:00.60 sambamba_v0.5.8                                                            
58781 1000      20   0  647576 405644   3116 S  2.7  0.7   0:00.66 sambamba_v0.5.8                                                            
58786 1000      20   0  647576 405644   3116 S  2.7  0.7   0:00.58 sambamba_v0.5.8                                                            
58774 1000      20   0  647576 405644   3116 S  2.3  0.7   0:00.65 sambamba_v0.5.8                                                            
58780 1000      20   0  647576 405644   3116 S  2.0  0.7   0:00.46 sambamba_v0.5.8                                                            
58782 1000      20   0  647576 405644   3116 S  1.7  0.7   0:00.51 sambamba_v0.5.8                                                  

I understand there is a main thread and a single task pool thread, but what are all those other threads doing?

@tetron
Copy link
Author

tetron commented Feb 11, 2016

We are running the v0.5.8 release binary for Linux x64.

@lomereiter
Copy link
Contributor

Thanks for the excellent report!

There are two task pools being created here, where one of them mostly sits idle. I fixed this in the markdup-extsort branch (9867d3a).
As to the deadlock, I have little idea, unfortunately (I suspect a bug in GC, since view tool became more stable after moving to manual memory management). Meanwhile I can recommend https://github.com/gt1/biobambam from which I borrowed the algorithm.

@lomereiter
Copy link
Contributor

Could it be that you're hitting this bug? http://www.infoq.com/news/2015/05/redhat-futex
Please check your kernel version.

@brettcs
Copy link

brettcs commented Feb 15, 2016

On the system in question, we're running Linux 3.19.0-49-generic on Ubuntu 14.04. It's a little hard to tell from the article exactly what's affected, but since the bug was introduced in Linux 3.10 it seems like we're probably new enough to be safe.

@andyrepton
Copy link

Hi there,

We are hitting the same bug on certain systems. We've checked the kernel bug in question, and confirmed that our systems include the patch. In addition to hanging, we get the following stack trace in /var/log/messages:

May 12 21:41:14 node1 kernel: ------------[ cut here ]------------
May 12 21:41:14 node1 kernel: WARNING: at /home/builder/builds/kernel/3.10.0-327.13.1.el7/20160331160149/BUILD/kernel-3.10.0-327.13.1.el7/linux-3.10.0-327.13.1.el7.x86_64/arch/x86/include/asm/thread_info.h:249 sigsuspend+0x6d/0x70()
May 12 21:41:14 node1 kernel: Modules linked in: binfmt_misc rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache dm_mod ppdev kvm_intel kvm crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd parport_pc sg parport pcspkr i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sr_mod cdrom ata_generic pata_acpi virtio_blk virtio_net cirrus syscopyarea ata_piix sysfillrect sysimgblt drm_kms_helper crct10dif_pclmul crct10dif_common ttm crc32c_intel drm virtio_pci serio_raw virtio_ring libata i2c_core virtio floppy
May 12 21:41:14 node1 kernel: CPU: 5 PID: 25158 Comm: sambamba Not tainted 3.10.0-327.13.1.el7.x86_64 #1
May 12 21:41:14 node1 kernel: Hardware name: Mission Critical Cloud Cosmic KVM Hypervisor, BIOS seabios-1.7.5-11.el7 04/01/2014
May 12 21:41:14 node1 kernel: 0000000000000000 000000009b9f5527 ffff880422b33ef8 ffffffff8163571c
May 12 21:41:14 node1 kernel: ffff880422b33f30 ffffffff8107b200 ffff8816e8577300 0000000000000000
May 12 21:41:14 node1 kernel: 00002ab03fb059c0 00002ab03fb039f8 00002ab03d4b1d00 ffff880422b33f40
May 12 21:41:14 node1 kernel: Call Trace:
May 12 21:41:14 node1 kernel: [<ffffffff8163571c>] dump_stack+0x19/0x1b
May 12 21:41:14 node1 kernel: [<ffffffff8107b200>] warn_slowpath_common+0x70/0xb0
May 12 21:41:14 node1 kernel: [<ffffffff8107b34a>] warn_slowpath_null+0x1a/0x20
May 12 21:41:14 node1 kernel: [<ffffffff81094a6d>] sigsuspend+0x6d/0x70
May 12 21:41:14 node1 kernel: [<ffffffff81094abf>] SyS_rt_sigsuspend+0x4f/0x70
May 12 21:41:14 node1 kernel: [<ffffffff81645ec9>] system_call_fastpath+0x16/0x1b
May 12 21:41:14 node1 kernel: ---[ end trace 424a8a826b4af3f6 ]---

With the same input bam's it will repeatedly fail, but regenerating the BAM's can often fix the issue. We're still debugging, but we've also noticed that this error doesn't happen on XenServer hypervisors, only on KVM. Is there anything we could do to generate more information?

Thanks in advance!

@lomereiter
Copy link
Contributor

lomereiter commented May 18, 2016

@Seth-Karlo thanks for putting effort into debugging, 'repeatedly' sounds promising to me.
Could you run it several times with --show-progress flag and see if it stops in the same place or a different one each time?

@andyrepton
Copy link

Sure thing, I'm waiting for another run to fail again. Once it does I'll do the test and report back my findings.

@rtnh
Copy link

rtnh commented May 20, 2016

@lomereiter I'm a colleague of Seth-Karlo and followed up on one of the hanging sambamba processes today. I've taken some logs snippets and created a stack trace for the deadlocked process.
You can find it in this gist: https://gist.github.com/rtnh/e2eab6afa7c0a37dbc96578d0f73c540

I've also rescheduled the same job with --show-progress, only to find it finished without fault:
finding positions of the duplicate reads in the file...
[==============================================================================]
sorting 460459648 end pairs... done in 25973 ms
sorting 2928598 single ends (among them 0 unmatched pairs)... done in 48 ms
collecting indices of duplicate reads... done in 6897 ms
found 72689016 duplicates, sorting the list... done in 722 ms
collected list of positions in 36 min 4 sec
marking duplicates...
[==============================================================================][ ]
total time elapsed: 75 min 9 sec

I hope the stack trace can help you root cause the problem.
If you can use any additional information, please let me know.

@WalterWaldron
Copy link

It would be good to have the strace output prior to the deadlock (to have some context.)
Perhaps limiting strace to futex and signal syscalls and only keeping the last few hundred lines would be sufficient.

@rtnh
Copy link

rtnh commented May 27, 2016

I've tried to get some more data on the deadlock situation. But that just seems to confirm we're still hitting the previously mentioned kernel bug. In this article it says attaching gdb or strace will wake up the application: https://access.redhat.com/solutions/1386323
In our case this doesn't work like that. However if we run the entire markdup with an strace attached to the process the deadlock never occurs. We've had 5 or 6 successful runs so far to confirm this behavior.

@lomereiter
Copy link
Contributor

Absence of errors with attached strace could be explained simply by the process being slowed down so that the probability of multithreading-related errors is lower.
Are you limiting its scope as suggested above? (strace -e trace=futex,signal)

@rtnh
Copy link

rtnh commented Jun 1, 2016

For now I've been running summarized straces on the processes just to keep them running.
This does give some insight in what a 'limited' scope would though. All of the process summaries look a lot like the one below. So a limited scope would filter out about 1% of the syscalls.
Process 6557 attached with 16 threads
% time seconds usecs/call calls errors syscall


98.65 21938.838735 109 200371614 39874878 futex
0.56 124.005807 1004 123566 read
0.39 86.675194 56 1546677 write
0.38 85.132016 786 108270 108270 rt_sigsuspend
0.01 1.802600 984 1832 close
0.00 0.937127 4 216540 tgkill
0.00 0.248766 1244 200 munmap
0.00 0.138527 4074 34 mremap
0.00 0.130571 71 1832 open
0.00 0.086016 143 600 unlink
0.00 0.071378 0 216540 110137 rt_sigreturn
0.00 0.004750 1 3442 lseek
0.00 0.003963 7 608 600 lstat
0.00 0.000552 2 300 fstat
0.00 0.000000 0 72 mmap
0.00 0.000000 0 18 brk
0.00 0.000000 0 15 madvise


100.00 22238.076002 202592160 40093885 total

@austindoupnik
Copy link

We also seem to be seeing this issue on Linux ip-10-0-3-66 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-1 (2016-03-06) x86_64 GNU/Linux.

Here is the strace dump, our process also is not awoken by an strace attach. Might try gdb in a bit.

Process 117238 attached with 56 threads
[pid 117301] rt_sigsuspend(~[USR2 RTMIN RT_1], 8 <unfinished ...>
[pid 117300] rt_sigsuspend(~[USR2 RTMIN RT_1], 8 <unfinished ...>
[pid 117299] rt_sigsuspend(~[USR2 RTMIN RT_1], 8 <unfinished ...>
[pid 117298] rt_sigsuspend(~[USR2 RTMIN RT_1], 8 <unfinished ...>
[pid 117297] rt_sigsuspend(~[USR2 RTMIN RT_1], 8 <unfinished ...>
[pid 117296] rt_sigsuspend(~[USR2 RTMIN RT_1], 8 <unfinished ...>
[pid 117295] rt_sigsuspend(~[USR2 RTMIN RT_1], 8 <unfinished ...>
[pid 117294] rt_sigsuspend(~[USR2 RTMIN RT_1], 8 <unfinished ...>
[pid 117293] rt_sigsuspend(~[USR2 RTMIN RT_1], 8 <unfinished ...>
[pid 117292] rt_sigsuspend(~[USR2 RTMIN RT_1], 8 <unfinished ...>
[pid 117291] rt_sigsuspend(~[USR2 RTMIN RT_1], 8 <unfinished ...>
[pid 117290] rt_sigsuspend(~[USR2 RTMIN RT_1], 8 <unfinished ...>
[pid 117289] rt_sigsuspend(~[USR2 RTMIN RT_1], 8 <unfinished ...>
[pid 117288] rt_sigsuspend(~[USR2 RTMIN RT_1], 8 <unfinished ...>
[pid 117287] rt_sigsuspend(~[USR2 RTMIN RT_1], 8 <unfinished ...>
[pid 117286] rt_sigsuspend(~[USR2 RTMIN RT_1], 8 <unfinished ...>
[pid 117285] rt_sigsuspend(~[USR2 RTMIN RT_1], 8 <unfinished ...>
[pid 117284] rt_sigsuspend(~[USR2 RTMIN RT_1], 8 <unfinished ...>
[pid 117283] rt_sigsuspend(~[USR2 RTMIN RT_1], 8 <unfinished ...>
[pid 117282] rt_sigsuspend(~[USR2 RTMIN RT_1], 8 <unfinished ...>
[pid 117281] rt_sigsuspend(~[USR2 RTMIN RT_1], 8 <unfinished ...>
[pid 117280] rt_sigsuspend(~[USR2 RTMIN RT_1], 8 <unfinished ...>
[pid 117279] rt_sigsuspend(~[USR2 RTMIN RT_1], 8 <unfinished ...>
[pid 117278] rt_sigsuspend(~[USR2 RTMIN RT_1], 8 <unfinished ...>
[pid 117277] rt_sigsuspend(~[USR2 RTMIN RT_1], 8 <unfinished ...>
[pid 117276] rt_sigsuspend(~[USR2 RTMIN RT_1], 8 <unfinished ...>
[pid 117275] rt_sigsuspend(~[USR2 RTMIN RT_1], 8 <unfinished ...>
[pid 117274] rt_sigsuspend(~[USR2 RTMIN RT_1], 8 <unfinished ...>
[pid 117273] rt_sigsuspend(~[USR2 RTMIN RT_1], 8 <unfinished ...>
[pid 117272] rt_sigsuspend(~[USR2 RTMIN RT_1], 8 <unfinished ...>
[pid 117271] rt_sigsuspend(~[USR2 RTMIN RT_1], 8 <unfinished ...>
[pid 117262] futex(0x7f2847ecca9c, FUTEX_WAIT_PRIVATE, 1115714, NULL <unfinished ...>
[pid 117261] futex(0x7f2847ecca9c, FUTEX_WAIT_PRIVATE, 1115733, NULL <unfinished ...>
[pid 117262] <... futex resumed> )      = -1 EAGAIN (Resource temporarily unavailable)
[pid 117262] futex(0x7f2847ecca9c, FUTEX_WAIT_PRIVATE, 1115733, NULL <unfinished ...>
[pid 117260] futex(0x7f2847ecca9c, FUTEX_WAIT_PRIVATE, 1115724, NULL <unfinished ...>
[pid 117259] futex(0x7f2847ecca9c, FUTEX_WAIT_PRIVATE, 1115717, NULL <unfinished ...>
[pid 117260] <... futex resumed> )      = -1 EAGAIN (Resource temporarily unavailable)
[pid 117259] <... futex resumed> )      = -1 EAGAIN (Resource temporarily unavailable)
[pid 117260] futex(0x7f2847ecca9c, FUTEX_WAIT_PRIVATE, 1115733, NULL <unfinished ...>
[pid 117259] futex(0x7f2847ecca9c, FUTEX_WAIT_PRIVATE, 1115733, NULL <unfinished ...>
[pid 117258] futex(0x7f2847ecca9c, FUTEX_WAIT_PRIVATE, 1115723, NULL <unfinished ...>
[pid 117257] futex(0x7f2847ecca9c, FUTEX_WAIT_PRIVATE, 1115727, NULL <unfinished ...>
[pid 117258] <... futex resumed> )      = -1 EAGAIN (Resource temporarily unavailable)
[pid 117257] <... futex resumed> )      = -1 EAGAIN (Resource temporarily unavailable)
[pid 117258] futex(0x7f2847ecca9c, FUTEX_WAIT_PRIVATE, 1115733, NULL <unfinished ...>
[pid 117257] futex(0x7f2847ecca9c, FUTEX_WAIT_PRIVATE, 1115733, NULL <unfinished ...>
[pid 117256] futex(0x7f2847ecca9c, FUTEX_WAIT_PRIVATE, 1115713, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 117255] futex(0x7f2847ecca9c, FUTEX_WAIT_PRIVATE, 1115722, NULL <unfinished ...>
[pid 117256] futex(0x7f2847ecca9c, FUTEX_WAIT_PRIVATE, 1115733, NULL <unfinished ...>

@MartinNowak
Copy link

Thanks @rtnh for the nice gist w/ backtraces (from gdb I guess, right?).
I assume all the other deadlocks occur in similar situations, so rt_sigsuspend is the GC thread suspend handler, and futex wait the condition.wait of std.parallelism's work queue.
Your gist @rtnh is w/ a 3.10 kernel version which is affected by the above mentioned bug, see version tags of torvalds/linux@b0c29f7. From the backtrace it looks a lot like a race condition between pthread_cond_wait and receiving a signal, threads waiting for mutexes/conditions should always wake up and process the signal.
There is another related kernel bug with the similar symptoms Nodes appear unresponsive due to a Linux futex_wait() kernel bug, this one got fixed w/ 3.18 torvalds/linux@76835b0.

@MartinNowak
Copy link

Couldn't reproduce this w/ sambamda-0.5.8, VirtualBox, a Core i7-6700, and debian-7.4/linux 3.2.0.

@sambrightman
Copy link
Collaborator

sambrightman commented Nov 2, 2016

@MartinNowak what makes you think this is broken in the 3.10 version mentioned in the Gist? Both issues appears to have their fixes in:

[sam@Sams-MacBook-Pro Downloads]$ rpm -qp kernel-3.10.0-327.18.2.el7.src.rpm --changelog | grep futex
warning: kernel-3.10.0-327.18.2.el7.src.rpm: NOKEY, key ID f4a80eb5
- [kernel] futex: Remove bogus hrtimer_active() check (Prarit Bhargava) [1217140]
- [perf] bench futex: Fix hung wakeup tasks after requeueing (Jiri Olsa) [1222189]
- [kernel] futex: Mention key referencing differences between shared and private futexes (Larry Woodman) [1205862]
- [kernel] futex: Ensure get_futex_key_refs() always implies a barrier (Larry Woodman) [1205862]
- [tools] perf/bench/futex: Sanitize -q option in requeue (Jiri Olsa) [1169436]
- [tools] perf/bench/futex: Support operations for shared futexes (Jiri Olsa) [1169436]
- [tools] perf/bench/futex: Use global --repeat option (Jiri Olsa) [1169436]
- [tools] perf/bench: Update manpage to mention numa and futex (Jiri Olsa) [1134356]
- [tools] perf/bench: Add futex-requeue microbenchmark (Jiri Olsa) [1134356]
- [tools] perf/bench: Add futex-wake microbenchmark (Jiri Olsa) [1134356]
- [tools] perf/bench: Add futex-hash microbenchmark (Jiri Olsa) [1134356]
- [kernel] futex: Make lookup_pi_state more robust (Larry Woodman) [1104520] {CVE-2014-3153}
- [kernel] futex: Always cleanup owner tid in unlock_pi (Larry Woodman) [1104520] {CVE-2014-3153}
- [kernel] futex: Validate atomic acquisition in futex_lock_pi_atomic() (Larry Woodman) [1104520] {CVE-2014-3153}
- [kernel] futex: prevent requeue pi on same futex (Larry Woodman) [1104520] {CVE-2014-3153}
- [kernel] futex: Fix pthread_cond_broadcast() to wake up all threads (Larry Woodman) [1084757]
- [kernel] futex: revert back to the explicit waiter counting code (Larry Woodman) [1081100]
- [kernel] futexes: Fix futex_hashsize initialization (Larry Woodman) [1069800]
- [kernel] futexes: Avoid taking the hb->lock if there's nothing to wake up (Larry Woodman) [1069800]
- [kernel] futexes: Document multiprocessor ordering guarantees (Larry Woodman) [1069800]
- [kernel] futexes: Increase hash table size for better performance (Larry Woodman) [1069800]
- [kernel] futexes: Clean up various details (Larry Woodman) [1069800]
- [kernel] futex: move user address verification up to common code (Larry Woodman) [1069800]
- [kernel] futex: fix handling of read-only-mapped hugepages (Larry Woodman) [1069800]
- [tools] perf/trace: Add beautifier for futex 'operation' parm (Jiri Olsa) [1036665]
- [kernel] futex: use freezable blocking call (Myron Stowe) [991615]

If you grab the SRPM and apply patches you can also see the relevant changes in kernel/futex.c. The host kernel @rtnh mentions (Linux 3.19.0-49-generic) also appears to contain both fixes. There's no Ubuntu release version mentioned to be completely sure. https://launchpad.net/ubuntu/+source/linux-lts-vivid/3.19.0-49.55~14.04.1 at least contains them.

@pjotrp
Copy link
Member

pjotrp commented Feb 24, 2017

I am willing to look at this bug if anyone is still facing it.

@pjotrp pjotrp self-assigned this Feb 24, 2017
@pjotrp pjotrp closed this as completed Nov 8, 2017
@isthisthat
Copy link

this is still an issue for us, causing a lot of seemingly stochastic failures, so hard to pin down. Samtools has recently introduced a markdup sub-comand so we've switched to that.

@pjotrp
Copy link
Member

pjotrp commented Jan 17, 2018

Thanks for reporting. Reopened because I am rewriting markdup combined with subsampling.

@pjotrp pjotrp reopened this Jan 17, 2018
@isthisthat
Copy link

that would be ideal!!!

@pjotrp
Copy link
Member

pjotrp commented Feb 8, 2018

@isthisthat can you tell me what CPU markdup is crashing on? Is it a Xeon 26xx series?

@blmoore
Copy link

blmoore commented Feb 8, 2018

Hi @pjotrp (I've been looking at this with @isthisthat) — yes (via c4 AWS EC2 instances). Just to clarify: not crashing but deadlocking indefinitely

@pjotrp
Copy link
Member

pjotrp commented Feb 8, 2018

These are Intel Xeon E5-2666 v3 and should not suffer from the particular hardware bug we encountered in #335. It is interesting it only appears on KVM instances (reported above). I think we ought to try a new release that will come out in the coming days because it uses latest LDC and LLVM. If that does not fix it we can consider replacing the reader - which appears to have the deadlock. I have written a new reader and it would be interesting to try.

@pjotrp
Copy link
Member

pjotrp commented Oct 17, 2018

Can you check whether the new release still shows this behaviour?

@dulunar
Copy link

dulunar commented Feb 10, 2021

sambamba version: 0.7.1, downloaded binary from release's page and uncompress for using.

LSB Version:    core-2.0-amd64:core-2.0-noarch:core-3.0-amd64:core-3.0-noarch:core-3.1-amd64:core-3.1-noarch:core-3.2-amd64:core-3.2-noarch:core-4.0-amd64:core-4.0-noarch:core-4.1-amd64:core-4.1-noarch:security-4.0-amd64:security-4.0-noarch:security-4.1-amd64:security-4.1-noarch
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.6 LTS
Release:        14.04
Codename:       trusty

Feb 10 01:39, when I used command as follow:

$ /home/luna/Desktop/Software/sambamba/build/sambamba markdup -t 16 -l 9 --tmpdir ./ --sort-buffer-size 4096 ./Bulk.mem.sort.bam ./Bulk.mem.sort.mkdup.bam

But it was deadlock at Feb 10 01:40. There are just 42 bam files:

$ ls ~/work/TempChimera/Bulk/sambatmp/sambamba-pid70545-markdup-oerh
PairedEndsInforbsv0  sorted.13.bam.idx  sorted.18.bam.idx  sorted.22.bam.idx  sorted.27.bam.idx  sorted.31.bam.idx  sorted.36.bam.idx  sorted.40.bam.idx  sorted.6.bam.idx
sorted.0.bam         sorted.14.bam      sorted.19.bam      sorted.23.bam      sorted.28.bam      sorted.32.bam      sorted.37.bam      sorted.41.bam      sorted.7.bam
sorted.0.bam.idx     sorted.14.bam.idx  sorted.19.bam.idx  sorted.23.bam.idx  sorted.28.bam.idx  sorted.32.bam.idx  sorted.37.bam.idx  sorted.41.bam.idx  sorted.7.bam.idx
sorted.10.bam        sorted.15.bam      sorted.1.bam       sorted.24.bam      sorted.29.bam      sorted.33.bam      sorted.38.bam      sorted.42.bam      sorted.8.bam
sorted.10.bam.idx    sorted.15.bam.idx  sorted.1.bam.idx   sorted.24.bam.idx  sorted.29.bam.idx  sorted.33.bam.idx  sorted.38.bam.idx  sorted.42.bam.idx  sorted.8.bam.idx
sorted.11.bam        sorted.16.bam      sorted.20.bam      sorted.25.bam      sorted.2.bam       sorted.34.bam      sorted.39.bam      sorted.4.bam       sorted.9.bam
sorted.11.bam.idx    sorted.16.bam.idx  sorted.20.bam.idx  sorted.25.bam.idx  sorted.2.bam.idx   sorted.34.bam.idx  sorted.39.bam.idx  sorted.4.bam.idx   sorted.9.bam.idx
sorted.12.bam        sorted.17.bam      sorted.21.bam      sorted.26.bam      sorted.30.bam      sorted.35.bam      sorted.3.bam       sorted.5.bam
sorted.12.bam.idx    sorted.17.bam.idx  sorted.21.bam.idx  sorted.26.bam.idx  sorted.30.bam.idx  sorted.35.bam.idx  sorted.3.bam.idx   sorted.5.bam.idx
sorted.13.bam        sorted.18.bam      sorted.22.bam      sorted.27.bam      sorted.31.bam      sorted.36.bam      sorted.40.bam      sorted.6.bam

And I checked the /var/log/messages, I found this:

Feb 10 01:46:09 bioinfo vmunix: [2800262.044005] ------------[ cut here ]------------
Feb 10 01:46:09 bioinfo vmunix: [2800262.044027] WARNING: CPU: 13 PID: 70577 at /build/linux-lts-xenial-lRzcrX/linux-lts-xenial-4.4.0/arch/x86/include/asm/thread_info.h:226
sigsuspend+0x6d/0x70()
Feb 10 01:46:09 bioinfo vmunix: [2800262.044031] Modules linked in: nfsv3 xt_multiport ipmi_devintf ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 xt_hl ip6t_rt nf_conntrack_ipv6 nf
_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ip6table_filter ip6_tabl
es nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables intel_rapl x86_pkg_temp_thermal intel_pow
erclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 ipmi_ssif lrw gf128mul glue_helper ablk_helper cryptd joyde
v input_leds ast ttm drm_kms_helper sb_edac drm edac_core fb_sys_fops syscopyarea sysfillrect sysimgblt mei_me mei lpc_ich shpchp wmi ipmi_si 8250_fintek rfcomm ipmi_msghand
ler bnep bluetooth parport_pc ppdev acpi_pad mac_hid knem(OE) lp parport nfsd auth_rpcgss nfs_acl binfmt_misc nfs lockd grace sunrpc fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(
OE) iw_cm(
Feb 10 01:46:09 bioinfo vmunix: onfigfs ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_ib(OE) ib_uverbs(OE) ib_core(OE) mlx4
_en(OE) vxlan ip6_udp_tunnel udp_tunnel raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic mlx4_core(OE) igb mlx_compat(OE) usbhid mpt
3sas i2c_algo_bit ahci dca raid_class hid libahci ptp scsi_transport_sas raid6_pq pps_core libcrc32c raid1 raid0 multipath linear fjes
Feb 10 01:46:09 bioinfo vmunix: [2800262.044237] CPU: 13 PID: 70577 Comm: sambamba Tainted: G        W  OE   4.4.0-148-generic #174~14.04.1-Ubuntu
Feb 10 01:46:09 bioinfo vmunix: [2800262.044242] Hardware name: Sugon I840-G20/80B32-U4/H, BIOS 2.57 05/15/2018
Feb 10 01:46:09 bioinfo vmunix: [2800262.044246]  0000000000000000 ffff881625847ed0 ffffffff813eee37 0000000000000000
Feb 10 01:46:09 bioinfo vmunix: [2800262.044255]  ffffffff81cc5118 ffff881625847f08 ffffffff810829f6 ffff882038e82a00
Feb 10 01:46:09 bioinfo vmunix: [2800262.044263]  00007fc2c57f8c50 00007fc32c6f12b8 00007fc2c57f85a8 0000000000000001
Feb 10 01:46:09 bioinfo vmunix: [2800262.044270] Call Trace:
Feb 10 01:46:09 bioinfo vmunix: [2800262.044288]  [<ffffffff813eee37>] dump_stack+0x63/0x8c
Feb 10 01:46:09 bioinfo vmunix: [2800262.044297]  [<ffffffff810829f6>] warn_slowpath_common+0x86/0xc0
Feb 10 01:46:09 bioinfo vmunix: [2800262.044303]  [<ffffffff81082aea>] warn_slowpath_null+0x1a/0x20
Feb 10 01:46:09 bioinfo vmunix: [2800262.044310]  [<ffffffff81092aed>] sigsuspend+0x6d/0x70
Feb 10 01:46:09 bioinfo vmunix: [2800262.044318]  [<ffffffff81094140>] SyS_rt_sigsuspend+0x40/0x50
Feb 10 01:46:09 bioinfo vmunix: [2800262.044332]  [<ffffffff8182d61b>] entry_SYSCALL_64_fastpath+0x22/0xcb
Feb 10 01:46:09 bioinfo vmunix: [2800262.044337] ---[ end trace 042f2041827a6556 ]---

I tried to use version 0.8.0, still deadlocked. The log:

Feb 10 19:17:10 bioinfo vmunix: [2863321.656228] WARNING: CPU: 67 PID: 16791 at /build/linux-lts-xenial-lRzcrX/linux-lts-xenial-4.4.0/arch/x86/include/asm/thread_info.h:226
sigsuspend+0x6d/0x70()
Feb 10 19:17:10 bioinfo vmunix: [2863321.656231] Modules linked in: nfsv3 xt_multiport ipmi_devintf ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 xt_hl ip6t_rt nf_conntrack_ipv6 nf
_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ip6table_filter ip6_tabl
es nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables intel_rapl x86_pkg_temp_thermal intel_pow
erclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 ipmi_ssif lrw gf128mul glue_helper ablk_helper cryptd joyde
v input_leds ast ttm drm_kms_helper sb_edac drm edac_core fb_sys_fops syscopyarea sysfillrect sysimgblt mei_me mei lpc_ich shpchp wmi ipmi_si 8250_fintek rfcomm ipmi_msghand
ler bnep bluetooth parport_pc ppdev acpi_pad mac_hid knem(OE) lp parport nfsd auth_rpcgss nfs_acl binfmt_misc nfs lockd grace sunrpc fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(
OE) iw_cm(
Feb 10 19:17:10 bioinfo vmunix: onfigfs ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_ib(OE) ib_uverbs(OE) ib_core(OE) mlx4
_en(OE) vxlan ip6_udp_tunnel udp_tunnel raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic mlx4_core(OE) igb mlx_compat(OE) usbhid mpt
3sas i2c_algo_bit ahci dca raid_class hid libahci ptp scsi_transport_sas raid6_pq pps_core libcrc32c raid1 raid0 multipath linear fjes
Feb 10 19:17:10 bioinfo vmunix: [2863321.656402] CPU: 67 PID: 16791 Comm: sambamba Tainted: G        W  OE   4.4.0-148-generic #174~14.04.1-Ubuntu
Feb 10 19:17:10 bioinfo vmunix: [2863321.656405] Hardware name: Sugon I840-G20/80B32-U4/H, BIOS 2.57 05/15/2018
Feb 10 19:17:10 bioinfo vmunix: [2863321.656408]  0000000000000000 ffff88185a7f3ed0 ffffffff813eee37 0000000000000000
Feb 10 19:17:10 bioinfo vmunix: [2863321.656413]  ffffffff81cc5118 ffff88185a7f3f08 ffffffff810829f6 ffff8840380ee200
Feb 10 19:17:10 bioinfo vmunix: [2863321.656417]  000000000001a05b 00007f521f066018 00007f51c8ff75e8 00007f521f066144
Feb 10 19:17:10 bioinfo vmunix: [2863321.656422] Call Trace:
Feb 10 19:17:10 bioinfo vmunix: [2863321.656436]  [<ffffffff813eee37>] dump_stack+0x63/0x8c
Feb 10 19:17:10 bioinfo vmunix: [2863321.656443]  [<ffffffff810829f6>] warn_slowpath_common+0x86/0xc0
Feb 10 19:17:10 bioinfo vmunix: [2863321.656446]  [<ffffffff81082aea>] warn_slowpath_null+0x1a/0x20
Feb 10 19:17:10 bioinfo vmunix: [2863321.656450]  [<ffffffff81092aed>] sigsuspend+0x6d/0x70
Feb 10 19:17:10 bioinfo vmunix: [2863321.656455]  [<ffffffff81094140>] SyS_rt_sigsuspend+0x40/0x50
Feb 10 19:17:10 bioinfo vmunix: [2863321.656466]  [<ffffffff8182d61b>] entry_SYSCALL_64_fastpath+0x22/0xcb
Feb 10 19:17:10 bioinfo vmunix: [2863321.656469] ---[ end trace 042f2041827a655d ]---

Now I am trying to use version 0.6.6 for testing if it's work. But it still deadlocked, and there are 298 bam files:

$ ls /home/luna/work/TempChimera/Bulk/sam/sambamba-pid17263-markdup-xxmk
PairedEndsInfocdty0      sorted.130.bam      sorted.165.bam      sorted.19.bam       sorted.233.bam      sorted.268.bam      sorted.32.bam      sorted.67.bam
PairedEndsInfocdty1      sorted.130.bam.idx  sorted.165.bam.idx  sorted.19.bam.idx   sorted.233.bam.idx  sorted.268.bam.idx  sorted.32.bam.idx  sorted.67.bam.idx
PairedEndsInfocdty2      sorted.131.bam      sorted.166.bam      sorted.1.bam        sorted.234.bam      sorted.269.bam      sorted.33.bam      sorted.68.bam
PairedEndsInfocdty3      sorted.131.bam.idx  sorted.166.bam.idx  sorted.1.bam.idx    sorted.234.bam.idx  sorted.269.bam.idx  sorted.33.bam.idx  sorted.68.bam.idx
PairedEndsInfocdty4      sorted.132.bam      sorted.167.bam      sorted.200.bam      sorted.235.bam      sorted.26.bam       sorted.34.bam      sorted.69.bam
PairedEndsInfocdty5      sorted.132.bam.idx  sorted.167.bam.idx  sorted.200.bam.idx  sorted.235.bam.idx  sorted.26.bam.idx   sorted.34.bam.idx  sorted.69.bam.idx
PairedEndsInfocdty6      sorted.133.bam      sorted.168.bam      sorted.201.bam      sorted.236.bam      sorted.270.bam      sorted.35.bam      sorted.6.bam
SingleEndBasicInfofozu0  sorted.133.bam.idx  sorted.168.bam.idx  sorted.201.bam.idx  sorted.236.bam.idx  sorted.270.bam.idx  sorted.35.bam.idx  sorted.6.bam.idx
sorted.0.bam             sorted.134.bam      sorted.169.bam      sorted.202.bam      sorted.237.bam      sorted.271.bam      sorted.36.bam      sorted.70.bam
sorted.0.bam.idx         sorted.134.bam.idx  sorted.169.bam.idx  sorted.202.bam.idx  sorted.237.bam.idx  sorted.271.bam.idx  sorted.36.bam.idx  sorted.70.bam.idx
sorted.100.bam           sorted.135.bam      sorted.16.bam       sorted.203.bam      sorted.238.bam      sorted.272.bam      sorted.37.bam      sorted.71.bam
sorted.100.bam.idx       sorted.135.bam.idx  sorted.16.bam.idx   sorted.203.bam.idx  sorted.238.bam.idx  sorted.272.bam.idx  sorted.37.bam.idx  sorted.71.bam.idx
sorted.101.bam           sorted.136.bam      sorted.170.bam      sorted.204.bam      sorted.239.bam      sorted.273.bam      sorted.38.bam      sorted.72.bam
sorted.101.bam.idx       sorted.136.bam.idx  sorted.170.bam.idx  sorted.204.bam.idx  sorted.239.bam.idx  sorted.273.bam.idx  sorted.38.bam.idx  sorted.72.bam.idx
sorted.102.bam           sorted.137.bam      sorted.171.bam      sorted.205.bam      sorted.23.bam       sorted.274.bam      sorted.39.bam      sorted.73.bam
sorted.102.bam.idx       sorted.137.bam.idx  sorted.171.bam.idx  sorted.205.bam.idx  sorted.23.bam.idx   sorted.274.bam.idx  sorted.39.bam.idx  sorted.73.bam.idx
sorted.103.bam           sorted.138.bam      sorted.172.bam      sorted.206.bam      sorted.240.bam      sorted.275.bam      sorted.3.bam       sorted.74.bam
sorted.103.bam.idx       sorted.138.bam.idx  sorted.172.bam.idx  sorted.206.bam.idx  sorted.240.bam.idx  sorted.275.bam.idx  sorted.3.bam.idx   sorted.74.bam.idx
sorted.104.bam           sorted.139.bam      sorted.173.bam      sorted.207.bam      sorted.241.bam      sorted.276.bam      sorted.40.bam      sorted.75.bam
sorted.104.bam.idx       sorted.139.bam.idx  sorted.173.bam.idx  sorted.207.bam.idx  sorted.241.bam.idx  sorted.276.bam.idx  sorted.40.bam.idx  sorted.75.bam.idx
sorted.105.bam           sorted.13.bam       sorted.174.bam      sorted.208.bam      sorted.242.bam      sorted.277.bam      sorted.41.bam      sorted.76.bam
sorted.105.bam.idx       sorted.13.bam.idx   sorted.174.bam.idx  sorted.208.bam.idx  sorted.242.bam.idx  sorted.277.bam.idx  sorted.41.bam.idx  sorted.76.bam.idx
sorted.106.bam           sorted.140.bam      sorted.175.bam      sorted.209.bam      sorted.243.bam      sorted.278.bam      sorted.42.bam      sorted.77.bam
sorted.106.bam.idx       sorted.140.bam.idx  sorted.175.bam.idx  sorted.209.bam.idx  sorted.243.bam.idx  sorted.278.bam.idx  sorted.42.bam.idx  sorted.77.bam.idx
sorted.107.bam           sorted.141.bam      sorted.176.bam      sorted.20.bam       sorted.244.bam      sorted.279.bam      sorted.43.bam      sorted.78.bam
sorted.107.bam.idx       sorted.141.bam.idx  sorted.176.bam.idx  sorted.20.bam.idx   sorted.244.bam.idx  sorted.279.bam.idx  sorted.43.bam.idx  sorted.78.bam.idx
sorted.108.bam           sorted.142.bam      sorted.177.bam      sorted.210.bam      sorted.245.bam      sorted.27.bam       sorted.44.bam      sorted.79.bam
sorted.108.bam.idx       sorted.142.bam.idx  sorted.177.bam.idx  sorted.210.bam.idx  sorted.245.bam.idx  sorted.27.bam.idx   sorted.44.bam.idx  sorted.79.bam.idx
sorted.109.bam           sorted.143.bam      sorted.178.bam      sorted.211.bam      sorted.246.bam      sorted.280.bam      sorted.45.bam      sorted.7.bam
sorted.109.bam.idx       sorted.143.bam.idx  sorted.178.bam.idx  sorted.211.bam.idx  sorted.246.bam.idx  sorted.280.bam.idx  sorted.45.bam.idx  sorted.7.bam.idx
sorted.10.bam            sorted.144.bam      sorted.179.bam      sorted.212.bam      sorted.247.bam      sorted.281.bam      sorted.46.bam      sorted.80.bam
sorted.10.bam.idx        sorted.144.bam.idx  sorted.179.bam.idx  sorted.212.bam.idx  sorted.247.bam.idx  sorted.281.bam.idx  sorted.46.bam.idx  sorted.80.bam.idx
sorted.110.bam           sorted.145.bam      sorted.17.bam       sorted.213.bam      sorted.248.bam      sorted.282.bam      sorted.47.bam      sorted.81.bam
sorted.110.bam.idx       sorted.145.bam.idx  sorted.17.bam.idx   sorted.213.bam.idx  sorted.248.bam.idx  sorted.282.bam.idx  sorted.47.bam.idx  sorted.81.bam.idx
sorted.111.bam           sorted.146.bam      sorted.180.bam      sorted.214.bam      sorted.249.bam      sorted.283.bam      sorted.48.bam      sorted.82.bam
sorted.111.bam.idx       sorted.146.bam.idx  sorted.180.bam.idx  sorted.214.bam.idx  sorted.249.bam.idx  sorted.283.bam.idx  sorted.48.bam.idx  sorted.82.bam.idx
sorted.112.bam           sorted.147.bam      sorted.181.bam      sorted.215.bam      sorted.24.bam       sorted.284.bam      sorted.49.bam      sorted.83.bam
sorted.112.bam.idx       sorted.147.bam.idx  sorted.181.bam.idx  sorted.215.bam.idx  sorted.24.bam.idx   sorted.284.bam.idx  sorted.49.bam.idx  sorted.83.bam.idx
sorted.113.bam           sorted.148.bam      sorted.182.bam      sorted.216.bam      sorted.250.bam      sorted.285.bam      sorted.4.bam       sorted.84.bam
sorted.113.bam.idx       sorted.148.bam.idx  sorted.182.bam.idx  sorted.216.bam.idx  sorted.250.bam.idx  sorted.285.bam.idx  sorted.4.bam.idx   sorted.84.bam.idx
sorted.114.bam           sorted.149.bam      sorted.183.bam      sorted.217.bam      sorted.251.bam      sorted.286.bam      sorted.50.bam      sorted.85.bam
sorted.114.bam.idx       sorted.149.bam.idx  sorted.183.bam.idx  sorted.217.bam.idx  sorted.251.bam.idx  sorted.286.bam.idx  sorted.50.bam.idx  sorted.85.bam.idx
sorted.115.bam           sorted.14.bam       sorted.184.bam      sorted.218.bam      sorted.252.bam      sorted.287.bam      sorted.51.bam      sorted.86.bam
sorted.115.bam.idx       sorted.14.bam.idx   sorted.184.bam.idx  sorted.218.bam.idx  sorted.252.bam.idx  sorted.287.bam.idx  sorted.51.bam.idx  sorted.86.bam.idx
sorted.116.bam           sorted.150.bam      sorted.185.bam      sorted.219.bam      sorted.253.bam      sorted.288.bam      sorted.52.bam      sorted.87.bam
sorted.116.bam.idx       sorted.150.bam.idx  sorted.185.bam.idx  sorted.219.bam.idx  sorted.253.bam.idx  sorted.288.bam.idx  sorted.52.bam.idx  sorted.87.bam.idx
sorted.117.bam           sorted.151.bam      sorted.186.bam      sorted.21.bam       sorted.254.bam      sorted.289.bam      sorted.53.bam      sorted.88.bam
sorted.117.bam.idx       sorted.151.bam.idx  sorted.186.bam.idx  sorted.21.bam.idx   sorted.254.bam.idx  sorted.289.bam.idx  sorted.53.bam.idx  sorted.88.bam.idx
sorted.118.bam           sorted.152.bam      sorted.187.bam      sorted.220.bam      sorted.255.bam      sorted.28.bam       sorted.54.bam      sorted.89.bam
sorted.118.bam.idx       sorted.152.bam.idx  sorted.187.bam.idx  sorted.220.bam.idx  sorted.255.bam.idx  sorted.28.bam.idx   sorted.54.bam.idx  sorted.89.bam.idx
sorted.119.bam           sorted.153.bam      sorted.188.bam      sorted.221.bam      sorted.256.bam      sorted.290.bam      sorted.55.bam      sorted.8.bam
sorted.119.bam.idx       sorted.153.bam.idx  sorted.188.bam.idx  sorted.221.bam.idx  sorted.256.bam.idx  sorted.290.bam.idx  sorted.55.bam.idx  sorted.8.bam.idx
sorted.11.bam            sorted.154.bam      sorted.189.bam      sorted.222.bam      sorted.257.bam      sorted.291.bam      sorted.56.bam      sorted.90.bam
sorted.11.bam.idx        sorted.154.bam.idx  sorted.189.bam.idx  sorted.222.bam.idx  sorted.257.bam.idx  sorted.291.bam.idx  sorted.56.bam.idx  sorted.90.bam.idx
sorted.120.bam           sorted.155.bam      sorted.18.bam       sorted.223.bam      sorted.258.bam      sorted.292.bam      sorted.57.bam      sorted.91.bam
sorted.120.bam.idx       sorted.155.bam.idx  sorted.18.bam.idx   sorted.223.bam.idx  sorted.258.bam.idx  sorted.292.bam.idx  sorted.57.bam.idx  sorted.91.bam.idx
sorted.121.bam           sorted.156.bam      sorted.190.bam      sorted.224.bam      sorted.259.bam      sorted.293.bam      sorted.58.bam      sorted.92.bam
sorted.121.bam.idx       sorted.156.bam.idx  sorted.190.bam.idx  sorted.224.bam.idx  sorted.259.bam.idx  sorted.293.bam.idx  sorted.58.bam.idx  sorted.92.bam.idx
sorted.122.bam           sorted.157.bam      sorted.191.bam      sorted.225.bam      sorted.25.bam       sorted.294.bam      sorted.59.bam      sorted.93.bam
sorted.122.bam.idx       sorted.157.bam.idx  sorted.191.bam.idx  sorted.225.bam.idx  sorted.25.bam.idx   sorted.294.bam.idx  sorted.59.bam.idx  sorted.93.bam.idx
sorted.123.bam           sorted.158.bam      sorted.192.bam      sorted.226.bam      sorted.260.bam      sorted.295.bam      sorted.5.bam       sorted.94.bam
sorted.123.bam.idx       sorted.158.bam.idx  sorted.192.bam.idx  sorted.226.bam.idx  sorted.260.bam.idx  sorted.295.bam.idx  sorted.5.bam.idx   sorted.94.bam.idx
sorted.124.bam           sorted.159.bam      sorted.193.bam      sorted.227.bam      sorted.261.bam      sorted.296.bam      sorted.60.bam      sorted.95.bam
sorted.124.bam.idx       sorted.159.bam.idx  sorted.193.bam.idx  sorted.227.bam.idx  sorted.261.bam.idx  sorted.296.bam.idx  sorted.60.bam.idx  sorted.95.bam.idx
sorted.125.bam           sorted.15.bam       sorted.194.bam      sorted.228.bam      sorted.262.bam      sorted.297.bam      sorted.61.bam      sorted.96.bam
sorted.125.bam.idx       sorted.15.bam.idx   sorted.194.bam.idx  sorted.228.bam.idx  sorted.262.bam.idx  sorted.297.bam.idx  sorted.61.bam.idx  sorted.96.bam.idx
sorted.126.bam           sorted.160.bam      sorted.195.bam      sorted.229.bam      sorted.263.bam      sorted.298.bam      sorted.62.bam      sorted.97.bam
sorted.126.bam.idx       sorted.160.bam.idx  sorted.195.bam.idx  sorted.229.bam.idx  sorted.263.bam.idx  sorted.298.bam.idx  sorted.62.bam.idx  sorted.97.bam.idx
sorted.127.bam           sorted.161.bam      sorted.196.bam      sorted.22.bam       sorted.264.bam      sorted.29.bam       sorted.63.bam      sorted.98.bam
sorted.127.bam.idx       sorted.161.bam.idx  sorted.196.bam.idx  sorted.22.bam.idx   sorted.264.bam.idx  sorted.29.bam.idx   sorted.63.bam.idx  sorted.98.bam.idx
sorted.128.bam           sorted.162.bam      sorted.197.bam      sorted.230.bam      sorted.265.bam      sorted.2.bam        sorted.64.bam      sorted.99.bam
sorted.128.bam.idx       sorted.162.bam.idx  sorted.197.bam.idx  sorted.230.bam.idx  sorted.265.bam.idx  sorted.2.bam.idx    sorted.64.bam.idx  sorted.99.bam.idx
sorted.129.bam           sorted.163.bam      sorted.198.bam      sorted.231.bam      sorted.266.bam      sorted.30.bam       sorted.65.bam      sorted.9.bam
sorted.129.bam.idx       sorted.163.bam.idx  sorted.198.bam.idx  sorted.231.bam.idx  sorted.266.bam.idx  sorted.30.bam.idx   sorted.65.bam.idx  sorted.9.bam.idx
sorted.12.bam            sorted.164.bam      sorted.199.bam      sorted.232.bam      sorted.267.bam      sorted.31.bam       sorted.66.bam
sorted.12.bam.idx        sorted.164.bam.idx  sorted.199.bam.idx  sorted.232.bam.idx  sorted.267.bam.idx  sorted.31.bam.idx   sorted.66.bam.idx

The log/messages:

Feb 10 19:29:46 bioinfo vmunix: [2864077.314148] ------------[ cut here ]------------
Feb 10 19:29:46 bioinfo vmunix: [2864077.314167] WARNING: CPU: 13 PID: 17291 at /build/linux-lts-xenial-lRzcrX/linux-lts-xenial-4.4.0/arch/x86/include/asm/thread_info.h:226 sigsuspend+0x6d/0x70()
Feb 10 19:29:46 bioinfo vmunix: [2864077.314170] Modules linked in: nfsv3 xt_multiport ipmi_devintf ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 ipmi_ssif lrw gf128mul glue_helper ablk_helper cryptd joydev input_leds ast ttm drm_kms_helper sb_edac drm edac_core fb_sys_fops syscopyarea sysfillrect sysimgblt mei_me mei lpc_ich shpchp wmi ipmi_si 8250_fintek rfcomm ipmi_msghandler bnep bluetooth parport_pc ppdev acpi_pad mac_hid knem(OE) lp parport nfsd auth_rpcgss nfs_acl binfmt_misc nfs lockd grace sunrpc fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(
Feb 10 19:29:46 bioinfo vmunix: onfigfs ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_ib(OE) ib_uverbs(OE) ib_core(OE) mlx4_en(OE) vxlan ip6_udp_tunnel udp_tunnel raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic mlx4_core(OE) igb mlx_compat(OE) usbhid mpt3sas i2c_algo_bit ahci dca raid_class hid libahci ptp scsi_transport_sas raid6_pq pps_core libcrc32c raid1 raid0 multipath linear fjes
Feb 10 19:29:46 bioinfo vmunix: [2864077.314320] CPU: 13 PID: 17291 Comm: sambamba.bak Tainted: G        W  OE   4.4.0-148-generic #174~14.04.1-Ubuntu
Feb 10 19:29:46 bioinfo vmunix: [2864077.314323] Hardware name: Sugon I840-G20/80B32-U4/H, BIOS 2.57 05/15/2018
Feb 10 19:29:46 bioinfo vmunix: [2864077.314325]  0000000000000000 ffff881d52a6fed0 ffffffff813eee37 0000000000000000
Feb 10 19:29:46 bioinfo vmunix: [2864077.314329]  ffffffff81cc5118 ffff881d52a6ff08 ffffffff810829f6 ffff8840360d0000
Feb 10 19:29:46 bioinfo vmunix: [2864077.314333]  0000000000936c01 00007efe9604d058 00007efe2a7fa6f8 00007efe96045f00
Feb 10 19:29:46 bioinfo vmunix: [2864077.314337] Call Trace:
Feb 10 19:29:46 bioinfo vmunix: [2864077.314350]  [<ffffffff813eee37>] dump_stack+0x63/0x8c
Feb 10 19:29:46 bioinfo vmunix: [2864077.314356]  [<ffffffff810829f6>] warn_slowpath_common+0x86/0xc0
Feb 10 19:29:46 bioinfo vmunix: [2864077.314359]  [<ffffffff81082aea>] warn_slowpath_null+0x1a/0x20
Feb 10 19:29:46 bioinfo vmunix: [2864077.314362]  [<ffffffff81092aed>] sigsuspend+0x6d/0x70
Feb 10 19:29:46 bioinfo vmunix: [2864077.314367]  [<ffffffff81094140>] SyS_rt_sigsuspend+0x40/0x50
Feb 10 19:29:46 bioinfo vmunix: [2864077.314378]  [<ffffffff8182d61b>] entry_SYSCALL_64_fastpath+0x22/0xcb
Feb 10 19:29:46 bioinfo vmunix: [2864077.314381] ---[ end trace 042f2041827a656c ]---

I don't know what happened, in previous, this command works normally, but sometimes it was deadlocked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests