Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

page allocation failure #631

Closed
ryao opened this issue Apr 2, 2012 · 4 comments
Closed

page allocation failure #631

ryao opened this issue Apr 2, 2012 · 4 comments
Milestone

Comments

@ryao
Copy link
Contributor

ryao commented Apr 2, 2012

I encountered a lockup of one of my GNU screen windows when doing some file related stuff in a KVM virtual machine with 6 cores and 3GB of RAM. This was on the Gentoo Linux 12.1 LiveDVD, which was compiled on the 31st. It should contain all commits up to and including fc41c64.

Here is the output:

[ 1945.637258] virtio-pci 0000:00:06.0: irq 44 for MSI/MSI-X
[ 1945.637314] virtio-pci 0000:00:06.0: irq 45 for MSI/MSI-X
[ 1945.667610] vda: unknown partition table
[ 1978.161279] SPL: Loaded module v0.6.0-rc8, using hostid 0xfb3f0db3
[ 1978.161418] znvpair: module license 'CDDL' taints kernel.
[ 1978.161420] Disabling lock debugging due to kernel taint
[ 1978.224062] ZFS: Loaded module v0.6.0-rc8, ZFS pool version 28, ZFS filesystem version 5
[ 1985.462765] vda: vda1 vda9
[ 2448.162738] z_wr_iss/2: page allocation failure: order:0, mode:0x20
[ 2448.162751] Pid: 32223, comm: z_wr_iss/2 Tainted: P O 3.3.0-gentoo-r1 #1
[ 2448.162757] Call Trace:
[ 2448.162762] [] ? warn_alloc_failed+0x10a/0x11d
[ 2448.162785] [] ? __alloc_pages_nodemask+0x611/0x66a
[ 2448.162796] [] ? try_to_wake_up+0x1fd/0x20f
[ 2448.162807] [] ? cache_alloc_refill+0x252/0x487
[ 2448.162814] [] ? try_to_wake_up+0x20f/0x20f
[ 2448.162823] [] ? __kmalloc+0x83/0xb0
[ 2448.162832] [] ? __alloc_skb+0x5b/0x11b
[ 2448.162841] [] ? tcp_send_ack+0x28/0xd5
[ 2448.162850] [] ? tcp_rcv_established+0x52d/0x6d2
[ 2448.162859] [] ? tcp_v4_do_rcv+0x65/0x270
[ 2448.162868] [] ? tcp_v4_rcv+0x52a/0x82e
[ 2448.162878] [] ? ip_local_deliver_finish+0x176/0x22f
[ 2448.162887] [] ? __netif_receive_skb+0x3b5/0x3f9
[ 2448.162896] [] ? netif_receive_skb+0x71/0x77
[ 2448.162904] [] ? __netdev_alloc_skb+0x15/0x2f
[ 2448.162918] [] ? virtnet_poll+0x49a/0x582 [virtio_net]
[ 2448.162928] [] ? net_rx_action+0x61/0x11c
[ 2448.162938] [] ? __do_softirq+0x7f/0x106
[ 2448.162947] [] ? call_softirq+0x1c/0x30
[ 2448.162955] [] ? do_softirq+0x31/0x67
[ 2448.162963] [] ? irq_exit+0x44/0x9e
[ 2448.162972] [] ? smp_apic_timer_interrupt+0x85/0x95
[ 2448.162980] [] ? apic_timer_interrupt+0x6b/0x70
[ 2448.162985] [] ? lzjb_compress+0x126/0x1d5 [zfs]
[ 2448.163065] [] ? zio_compress_data+0x63/0xa1 [zfs]
[ 2448.163093] [] ? zio_buf_alloc+0x228/0x50a [zfs]
[ 2448.163121] [] ? zio_execute+0xf1/0x267 [zfs]
[ 2448.163130] [] ? __wake_up+0x35/0x46
[ 2448.163144] [] ? __taskq_dispatch+0x74d/0x993 [spl]
[ 2448.163152] [] ? try_to_wake_up+0x1fd/0x20f
[ 2448.163160] [] ? try_to_wake_up+0x20f/0x20f
[ 2448.163172] [] ? __taskq_dispatch+0x48b/0x993 [spl]
[ 2448.163184] [] ? __taskq_dispatch+0x48b/0x993 [spl]
[ 2448.163193] [] ? kthread+0x81/0x89
[ 2448.163202] [] ? kernel_thread_helper+0x4/0x10
[ 2448.163211] [] ? kthread_worker_fn+0x107/0x107
[ 2448.163219] [] ? gs_change+0xb/0xb
[ 2448.163223] Mem-Info:
[ 2448.163227] DMA per-cpu:
[ 2448.163232] CPU 0: hi: 0, btch: 1 usd: 0
[ 2448.163238] CPU 1: hi: 0, btch: 1 usd: 0
[ 2448.163243] CPU 2: hi: 0, btch: 1 usd: 0
[ 2448.163248] CPU 3: hi: 0, btch: 1 usd: 0
[ 2448.163253] CPU 4: hi: 0, btch: 1 usd: 0
[ 2448.163258] CPU 5: hi: 0, btch: 1 usd: 0
[ 2448.163262] DMA32 per-cpu:
[ 2448.163267] CPU 0: hi: 186, btch: 31 usd: 144
[ 2448.163273] CPU 1: hi: 186, btch: 31 usd: 180
[ 2448.163278] CPU 2: hi: 186, btch: 31 usd: 187
[ 2448.163283] CPU 3: hi: 186, btch: 31 usd: 152
[ 2448.163288] CPU 4: hi: 186, btch: 31 usd: 77
[ 2448.163293] CPU 5: hi: 186, btch: 31 usd: 116
[ 2448.163306] active_anon:160996 inactive_anon:35850 isolated_anon:0
[ 2448.163310] active_file:113792 inactive_file:309822 isolated_file:0
[ 2448.163313] unevictable:0 dirty:0 writeback:0 unstable:0
[ 2448.163315] free:478 slab_reclaimable:48760 slab_unreclaimable:10211
[ 2448.163318] mapped:32266 shmem:91731 pagetables:6227 bounce:0
[ 2448.163337] DMA free:1108kB min:32kB low:40kB high:48kB active_anon:128kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15692kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:80kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[ 2448.163351] lowmem_reserve[]: 0 3014 3014 3014
[ 2448.163372] DMA32 free:804kB min:7008kB low:8760kB high:10512kB active_anon:643856kB inactive_anon:143400kB active_file:455168kB inactive_file:1239288kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3086548kB mlocked:0kB dirty:0kB writeback:0kB mapped:129064kB shmem:366924kB slab_reclaimable:195040kB slab_unreclaimable:40764kB kernel_stack:3344kB pagetables:24908kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 2448.163389] lowmem_reserve[]: 0 0 0 0
[ 2448.163396] DMA: 1_4kB 1_8kB 0_16kB 0_32kB 1_64kB 0_128kB 0_256kB 0_512kB 1_1024kB 0_2048kB 0_4096kB = 1100kB
[ 2448.163416] DMA32: 79_4kB 0_8kB 0_16kB 0_32kB 0_64kB 0_128kB 0_256kB 0_512kB 0_1024kB 0_2048kB 0_4096kB = 316kB
[ 2448.163436] 515461 total pagecache pages
[ 2448.163440] 0 pages in swap cache
[ 2448.163454] Swap cache stats: add 0, delete 0, find 0/0
[ 2448.163458] Free swap = 0kB
[ 2448.163461] Total swap = 0kB
[ 2448.177933] 786413 pages RAM
[ 2448.177933] 14632 pages reserved
[ 2448.177933] 265757 pages shared
[ 2448.177933] 658736 pages non-shared
[ 2448.946969] z_wr_iss/1: page allocation failure: order:0, mode:0x20
[ 2448.946974] Pid: 32222, comm: z_wr_iss/1 Tainted: P O 3.3.0-gentoo-r1 #1
[ 2448.946976] Call Trace:
[ 2448.946978] [] ? warn_alloc_failed+0x10a/0x11d
[ 2448.946989] [] ? __alloc_pages_nodemask+0x611/0x66a
[ 2448.946993] [] ? ip_local_deliver_finish+0x176/0x22f
[ 2448.947002] [] ? try_fill_recv+0x58/0x402 [virtio_net]
[ 2448.947052] [] ? virtnet_poll+0x4fc/0x582 [virtio_net]
[ 2448.947056] [] ? net_rx_action+0x61/0x11c
[ 2448.947061] [] ? __do_softirq+0x7f/0x106
[ 2448.947064] [] ? call_softirq+0x1c/0x30
[ 2448.947067] [] ? do_softirq+0x31/0x67
[ 2448.947069] [] ? irq_exit+0x44/0x9e
[ 2448.947071] [] ? do_IRQ+0x94/0xad
[ 2448.947074] [] ? common_interrupt+0x6b/0x6b
[ 2448.947076] [] ? lzjb_compress+0x148/0x1d5 [zfs]
[ 2448.947102] [] ? zio_compress_data+0x63/0xa1 [zfs]
[ 2448.947109] [] ? zio_buf_alloc+0x228/0x50a [zfs]
[ 2448.947117] [] ? zio_execute+0xf1/0x267 [zfs]
[ 2448.947120] [] ? __wake_up+0x35/0x46
[ 2448.947125] [] ? __taskq_dispatch+0x74d/0x993 [spl]
[ 2448.947128] [] ? try_to_wake_up+0x1fd/0x20f
[ 2448.947130] [] ? try_to_wake_up+0x20f/0x20f
[ 2448.947133] [] ? __taskq_dispatch+0x48b/0x993 [spl]
[ 2448.947136] [] ? __taskq_dispatch+0x48b/0x993 [spl]
[ 2448.947140] [] ? kthread+0x81/0x89
[ 2448.947142] [] ? kernel_thread_helper+0x4/0x10
[ 2448.947145] [] ? kthread_worker_fn+0x107/0x107
[ 2448.947147] [] ? gs_change+0xb/0xb
[ 2448.947148] Mem-Info:
[ 2448.947150] DMA per-cpu:
[ 2448.947151] CPU 0: hi: 0, btch: 1 usd: 0
[ 2448.947153] CPU 1: hi: 0, btch: 1 usd: 0
[ 2448.947154] CPU 2: hi: 0, btch: 1 usd: 0
[ 2448.947155] CPU 3: hi: 0, btch: 1 usd: 0
[ 2448.947157] CPU 4: hi: 0, btch: 1 usd: 0
[ 2448.947158] CPU 5: hi: 0, btch: 1 usd: 0
[ 2448.947160] DMA32 per-cpu:
[ 2448.947161] CPU 0: hi: 186, btch: 31 usd: 46
[ 2448.947163] CPU 1: hi: 186, btch: 31 usd: 174
[ 2448.947164] CPU 2: hi: 186, btch: 31 usd: 164
[ 2448.947166] CPU 3: hi: 186, btch: 31 usd: 140
[ 2448.947167] CPU 4: hi: 186, btch: 31 usd: 96
[ 2448.947168] CPU 5: hi: 186, btch: 31 usd: 178
[ 2448.947172] active_anon:161047 inactive_anon:35850 isolated_anon:0
[ 2448.947173] active_file:118176 inactive_file:269876 isolated_file:0
[ 2448.947174] unevictable:0 dirty:0 writeback:0 unstable:0
[ 2448.947175] free:647 slab_reclaimable:47485 slab_unreclaimable:10258
[ 2448.947175] mapped:32266 shmem:91733 pagetables:6227 bounce:0
[ 2448.947181] DMA free:1100kB min:32kB low:40kB high:48kB active_anon:128kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15692kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:80kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[ 2448.947185] lowmem_reserve[]: 0 3014 3014 3014
[ 2448.947191] DMA32 free:1488kB min:7008kB low:8760kB high:10512kB active_anon:644060kB inactive_anon:143400kB active_file:472704kB inactive_file:1079504kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3086548kB mlocked:0kB dirty:0kB writeback:0kB mapped:129064kB shmem:366932kB slab_reclaimable:189940kB slab_unreclaimable:40952kB kernel_stack:3344kB pagetables:24908kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1 all_unreclaimable? no
[ 2448.947195] lowmem_reserve[]: 0 0 0 0
[ 2448.947198] DMA: 1_4kB 1_8kB 0_16kB 0_32kB 1_64kB 0_128kB 0_256kB 0_512kB 1_1024kB 0_2048kB 0_4096kB = 1100kB
[ 2448.947203] DMA32: 100_4kB 21_8kB 20_16kB 9_32kB 1_64kB 0_128kB 0_256kB 0_512kB 0_1024kB 0_2048kB 0_4096kB = 1240kB
[ 2448.947209] 479847 total pagecache pages
[ 2448.947210] 0 pages in swap cache
[ 2448.947212] Swap cache stats: add 0, delete 0, find 0/0
[ 2448.947213] Free swap = 0kB
[ 2448.947214] Total swap = 0kB
[ 2448.961858] 786413 pages RAM
[ 2448.961860] 14632 pages reserved
[ 2448.961861] 260803 pages shared
[ 2448.961866] 664021 pages non-shared
[74936.355398] aufs may_rename_srcdir:487:firefox[1717]: renaming dir who has child(ren) on multiple branches, is not supported
[75255.190483] aufs au_xino_do_write:380:virtuoso-t[19231]: I/O Error, write failed (-28)
[75255.190495] aufs au_xino_write:416:virtuoso-t[19231]: I/O Error, write failed (-5)
[75255.190503] aufs au_set_h_iptr:89:virtuoso-t[19231]: I/O Error, failed au_xino_write() -5
[75255.192201] aufs au_xino_do_write:380:virtuoso-t[19231]: I/O Error, write failed (-28)
[75255.192210] aufs au_xino_write:416:virtuoso-t[19231]: I/O Error, write failed (-5)
[75255.194578] aufs au_xino_do_write:380:virtuoso-t[19231]: I/O Error, write failed (-28)
[75255.195288] aufs au_xino_do_write:380:virtuoso-t[19231]: I/O Error, write failed (-28)
[75856.394401] aufs au_xino_do_write:380:virtuoso-t[19231]: I/O Error, write failed (-28)
[75856.394412] aufs au_xino_write:416:virtuoso-t[19231]: I/O Error, write failed (-5)
[75856.396133] aufs au_xino_do_write:380:virtuoso-t[19231]: I/O Error, write failed (-28)
[75856.396142] aufs au_xino_write:416:virtuoso-t[19231]: I/O Error, write failed (-5)
[75856.397165] aufs au_xino_do_write:380:virtuoso-t[19231]: I/O Error, write failed (-28)
[75856.397314] aufs au_xino_do_write:380:virtuoso-t[19231]: I/O Error, write failed (-28)
[76457.601424] aufs au_xino_do_write:380:virtuoso-t[19231]: I/O Error, write failed (-28)
[76457.601436] aufs au_xino_write:416:virtuoso-t[19231]: I/O Error, write failed (-5)
[76457.603334] aufs au_xino_do_write:380:virtuoso-t[19231]: I/O Error, write failed (-28)
[76457.603344] aufs au_xino_write:416:virtuoso-t[19231]: I/O Error, write failed (-5)
[76457.605672] aufs au_xino_do_write:380:virtuoso-t[19231]: I/O Error, write failed (-28)
[76457.606379] aufs au_xino_do_write:380:virtuoso-t[19231]: I/O Error, write failed (-28)
[77058.794310] aufs au_xino_do_write:380:virtuoso-t[19231]: I/O Error, write failed (-28)
[77058.794322] aufs au_xino_write:416:virtuoso-t[19231]: I/O Error, write failed (-5)
[77058.795915] aufs au_xino_do_write:380:virtuoso-t[19231]: I/O Error, write failed (-28)
[77058.795926] aufs au_xino_write:416:virtuoso-t[19231]: I/O Error, write failed (-5)
[77058.797129] aufs au_xino_do_write:380:virtuoso-t[19231]: I/O Error, write failed (-28)
[77058.797282] aufs au_xino_do_write:380:virtuoso-t[19231]: I/O Error, write failed (-28)
[77659.959561] aufs au_xino_do_write:380:virtuoso-t[19231]: I/O Error, write failed (-28)
[77659.959573] aufs au_xino_write:416:virtuoso-t[19231]: I/O Error, write failed (-5)
[77659.961350] aufs au_xino_do_write:380:virtuoso-t[19231]: I/O Error, write failed (-28)
[77659.961363] aufs au_xino_write:416:virtuoso-t[19231]: I/O Error, write failed (-5)
[77659.964647] aufs au_xino_do_write:380:virtuoso-t[19231]: I/O Error, write failed (-28)
[77659.965426] aufs au_xino_do_write:380:virtuoso-t[19231]: I/O Error, write failed (-28)
[78261.142429] aufs au_xino_do_write:380:virtuoso-t[19231]: I/O Error, write failed (-28)
[78261.142441] aufs au_xino_write:416:virtuoso-t[19231]: I/O Error, write failed (-5)
[78261.143994] aufs au_xino_do_write:380:virtuoso-t[19231]: I/O Error, write failed (-28)
[78261.144013] aufs au_xino_write:416:virtuoso-t[19231]: I/O Error, write failed (-5)
[78261.146331] aufs au_xino_do_write:380:virtuoso-t[19231]: I/O Error, write failed (-28)
[78261.146750] aufs au_xino_do_write:380:virtuoso-t[19231]: I/O Error, write failed (-28)
[78862.321751] aufs au_xino_do_write:380:virtuoso-t[19231]: I/O Error, write failed (-28)
[78862.321762] aufs au_xino_write:416:virtuoso-t[19231]: I/O Error, write failed (-5)
[78862.323579] aufs au_xino_do_write:380:virtuoso-t[19231]: I/O Error, write failed (-28)
[78862.323589] aufs au_xino_write:416:virtuoso-t[19231]: I/O Error, write failed (-5)
[78862.326811] aufs au_xino_do_write:380:virtuoso-t[19231]: I/O Error, write failed (-28)
[78862.327105] aufs au_xino_do_write:380:virtuoso-t[19231]: I/O Error, write failed (-28)
[79463.499176] aufs au_xino_do_write:380:virtuoso-t[19231]: I/O Error, write failed (-28)
[79463.499188] aufs au_xino_write:416:virtuoso-t[19231]: I/O Error, write failed (-5)
[79463.500650] aufs au_xino_do_write:380:virtuoso-t[19231]: I/O Error, write failed (-28)
[79463.500660] aufs au_xino_write:416:virtuoso-t[19231]: I/O Error, write failed (-5)
[79463.502972] aufs au_xino_do_write:380:virtuoso-t[19231]: I/O Error, write failed (-28)
[79463.505156] aufs au_xino_do_write:380:virtuoso-t[19231]: I/O Error, write failed (-28)

It appears that a situation occurred in which the kernel could not satisfy a page request made by ZFS, which then crippled aufs. I was able to reboot the system from another GNU Screen window. I am going to guess that the code is missing a call to the OOM killer.

@behlendorf
Copy link
Contributor

Looking at the above stack I wouldn't exactly blame zfs. While zfs is at the top of the stack you'll notice we take in interrupt about half way through for the network driver. It's that allocation which fails, although that should at work trigger some backoff in the network stack. Then you hit the issue in the aufs stack, although -38 is ENOSPC so is it possible the FS was full?

Not to let zfs totally off the hook here we might have been partly responsible for the increased memory pressure. I'm hoping to merge the vm changes fairly soon assume they continue to test well on my various systems.

https://github.com/behlendorf/zfs/branches/vm

@ryao
Copy link
Contributor Author

ryao commented Apr 2, 2012

The aufs is the root filesystem of the Gentoo 12.1 LiveDVD. I had copied a ZFS snapshot into a local pool as part of a Gentoo Linux install and entered a chroot on a ZFS dataset. At the time of this error, I was using a screen session over SSH to build updates. The aufs was far from full when I had entered the chroot, although I did not check it after this issue had occurred.

Things running outside the chroot likely wrote to the aufs, but nothing should have made a substantial number of writes to have filled the aufs. In theory, the build process could have influenced this would have been through the system logger, although I have yet to see any messages logged to the system logger as part of Gentoo's build process. Anything else would have required the build process to escape both the package manager's sandbox and the chroot in which it was running.

You are right that the networking stack is involved, but it is strange that this is occurring in z_wr_iss, which I believe is owned by ZFS. This looks like the same bug reported in issue #540, although it involved a different networking driver. This is the first time that I have seen a page allocation failure, although I have heard similar reports of page allocation failures in #zfsonlinux on freenode from people running Debian or Ubuntu. My advice to them had been to report the issue here.

@ryao
Copy link
Contributor Author

ryao commented Apr 3, 2012

I have identified the cause of this bug as CONFIG_PREEMPT_VOLUNTARY=y.

@ryao
Copy link
Contributor Author

ryao commented Apr 16, 2012

I am closing this as a duplicate of issue #83.

@ryao ryao closed this as completed Apr 16, 2012
pcd1193182 pushed a commit to pcd1193182/zfs that referenced this issue Sep 26, 2023
This reverts commit c35ee11, reversing
changes made to 1630ff4.
pcd1193182 pushed a commit to pcd1193182/zfs that referenced this issue Sep 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants