-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop clearing access bits during page migration #6
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Ian May <[email protected]>
Signed-off-by: Ian May <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/1786013 Signed-off-by: Ian May <[email protected]>
Ignore: yes Signed-off-by: Ian May <[email protected]>
Signed-off-by: Ian May <[email protected]>
Signed-off-by: Ian May <[email protected]>
Signed-off-by: Ian May <[email protected]>
This reverts commit 93f3643. Signed-off-by: Dimitri John Ledkov <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/1786013 Signed-off-by: Dimitri John Ledkov <[email protected]>
Ignore: yes Signed-off-by: Dimitri John Ledkov <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/1992162 Properties: no-test-build Signed-off-by: Dimitri John Ledkov <[email protected]>
Signed-off-by: Dimitri John Ledkov <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/1786013 Signed-off-by: Kleber Sacilotto de Souza <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/1786013 Signed-off-by: Kleber Sacilotto de Souza <[email protected]>
Ignore: yes Signed-off-by: Kleber Sacilotto de Souza <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/1996300 Properties: no-test-build Signed-off-by: Kleber Sacilotto de Souza <[email protected]>
Signed-off-by: Kleber Sacilotto de Souza <[email protected]>
Ignore: yes Signed-off-by: Kleber Sacilotto de Souza <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/1998795 Properties: no-test-build Signed-off-by: Kleber Sacilotto de Souza <[email protected]>
Signed-off-by: Kleber Sacilotto de Souza <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/1999750 Currently Ubuntu Linux kernel header packages use stock Makefile as shipped by upstream kernel. During linux kernel package build however HOSTCC and CC are typically overridden to use explicit complier version $(DEB_HOST_GNU_TYPE)-gcc-12. This can lead to dkms module build failures as despite all efforts to reuse matching gcc version out of .config, various shell scripts / build systems / makefiles do not pass the CC variable as a make variable to the end make call that is used to build dkms modules. To avoid this, hardcode the correct compiler in the linux headers package shipped Makefile. This is similar to the Makefile includes that debian ships, albeit with less indirections. Tested by applying this patch to the hwe-5.19 kernel, rebuilding it, and testing that dkms module building is using expected compilers now. Signed-off-by: Dimitri John Ledkov <[email protected]> Acked-by: Kleber Sacilotto de Souza <[email protected]> Acked-by: Stefan Bader <[email protected]> Signed-off-by: Kleber Sacilotto de Souza <[email protected]>
Ignore: yes Signed-off-by: Kleber Sacilotto de Souza <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/1999745 Properties: no-test-build Signed-off-by: Kleber Sacilotto de Souza <[email protected]>
Signed-off-by: Kleber Sacilotto de Souza <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2002679 This reverts commit e96b739. In order to use the same compiler as used for kinetic:linux (gcc-12) we have made some changes to explicitly compile jammy:linux-hwe-5.19 with the same compiler. However, such changes are breaking the build of a number of dkms packages. Revert this commit, which would make the kernel compile with the default GCC in Jammy (gcc-11), while a proper solution is being worked on. Signed-off-by: Kleber Sacilotto de Souza <[email protected]>
Ignore: yes Signed-off-by: Kleber Sacilotto de Souza <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2002679 Changing the gcc version used for the build from gcc-12 to gcc (11) removed CONFIG_INIT_STACK_ALL_ZERO and CONFIG_SHADOW_CALL_STACK from being an option on the kernel config. Update the annotations file accondingly, keeping the annotation instead of simply removing it so when the configs are re-enabled in the future it stays as a reminder to update the annotation. Signed-off-by: Kleber Sacilotto de Souza <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2001755 Properties: no-test-build Signed-off-by: Kleber Sacilotto de Souza <[email protected]>
…3.01.02) BugLink: https://bugs.launchpad.net/bugs/1786013 Signed-off-by: Kleber Sacilotto de Souza <[email protected]>
Signed-off-by: Kleber Sacilotto de Souza <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2002170 Signed-off-by: Brad Figg <[email protected]> Acked-by: Brad Figg <[email protected]> Acked-by: Jacob Martin <[email protected]> Acked-by: Ian May <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2002170 Signed-off-by: Ian May <[email protected]> Signed-off-by: Brad Figg <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2002574 Tegra234 and later chips have hardware feature that detects wait state raised by slave as mentioned in TCG client doc. Signed-off-by: Krishna Yarlagadda <[email protected]> Acked-by: Brad Figg <[email protected]> Acked-by: Ian May <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off-by: Brad Figg <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2002574 Trusted Platform Module defines flow control where slave will drive data line at specified clock cycles. Tegra241 has TPM wait state detections support when using combined sequence transfers. Enabling the feature based on dt flag Signed-off-by: Krishna Yarlagadda <[email protected]> Acked-by: Brad Figg <[email protected]> Acked-by: Ian May <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off-by: Brad Figg <[email protected]>
Ignore: yes Signed-off-by: Brad Figg <[email protected]>
…dkms packages to be included Signed-off-by: Brad Figg <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2002688 Properties: no-test-build Signed-off-by: Brad Figg <[email protected]>
Signed-off-by: Brad Figg <[email protected]>
Ignore: yes Signed-off-by: Brad Figg <[email protected]>
… definition Signed-off-by: Brad Figg <[email protected]>
CONFIG_UBUNTU_ODM_DRIVERS to 'n' Signed-off-by: Brad Figg <[email protected]>
Signed-off-by: Brad Figg <[email protected]>
Ignore: yes Signed-off-by: Ian May <[email protected]> Signed-off-by: Brad Figg <[email protected]>
Signed-off-by: Ian May <[email protected]>
Ignore: yes Signed-off-by: Ian May <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2002688 Properties: no-test-build Signed-off-by: Ian May <[email protected]>
Signed-off-by: Ian May <[email protected]>
Signed-off-by: Ian May <[email protected]>
Commit e5251fd ("mm/hugetlb: introduce set_huge_swap_pte_at() helper") add set_huge_swap_pte_at() to handle swap entries on architectures that support hugepages consisting of contiguous ptes. And currently the set_huge_swap_pte_at() is only overridden by arm64. set_huge_swap_pte_at() provide a sz parameter to help determine the number of entries to be updated. But in fact, all hugetlb swap entries contain pfn information, so we can find the corresponding folio through the pfn recorded in the swap entry, then the folio_size() is the number of entries that need to be updated. And considering that users will easily cause bugs by ignoring the difference between set_huge_swap_pte_at() and set_huge_pte_at(). Let's handle swap entries in set_huge_pte_at() and remove the set_huge_swap_pte_at(), then we can call set_huge_pte_at() anywhere, which simplifies our coding. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Qi Zheng <[email protected]> Acked-by: Muchun Song <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> nvbug: https://nvbugswb.nvidia.com/NVBugs5/redir.aspx?url=/4067536 (cherry picked from commit 18f3962) Signed-off-by: Alistair Popple <[email protected]>
…n_pmd vma->vm_page_prot is read lockless from the rmap_walk, it may be updated concurrently. Using READ_ONCE to prevent the risk of reading intermediate values. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Muchun Song <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]> nvbug: https://nvbugswb.nvidia.com/NVBugs5/redir.aspx?url=/4067536 (cherry picked from commit 4286f14) Signed-off-by: Alistair Popple <[email protected]>
Replace all the magic "5" with the macro. Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Huang Ying <[email protected]> Signed-off-by: Peter Xu <[email protected]> nvbug: https://nvbugswb.nvidia.com/NVBugs5/redir.aspx?url=/4067536 (cherry picked from commit 9c61d53) Signed-off-by: Alistair Popple <[email protected]>
swapops.h contains quite a few layers of ifdef, some of the "else" and "endif" doesn't get proper comment on the macro so it's hard to follow on what are they referring to. Add the comments. Suggested-by: Nadav Amit <[email protected]> Reviewed-by: Huang Ying <[email protected]> Signed-off-by: Peter Xu <[email protected]> Reviewed-by: Alistair Popple <[email protected]> nvbug: https://nvbugswb.nvidia.com/NVBugs5/redir.aspx?url=/4067536 (cherry picked from commit eba4d77) Signed-off-by: Alistair Popple <[email protected]>
We've got a bunch of special swap entries that stores PFN inside the swap offset fields. To fetch the PFN, normally the user just calls swp_offset() assuming that'll be the PFN. Add a helper swp_offset_pfn() to fetch the PFN instead, fetching only the max possible length of a PFN on the host, meanwhile doing proper check with MAX_PHYSMEM_BITS to make sure the swap offsets can actually store the PFNs properly always using the BUILD_BUG_ON() in is_pfn_swap_entry(). One reason to do so is we never tried to sanitize whether swap offset can really fit for storing PFN. At the meantime, this patch also prepares us with the future possibility to store more information inside the swp offset field, so assuming "swp_offset(entry)" to be the PFN will not stand any more very soon. Replace many of the swp_offset() callers to use swp_offset_pfn() where proper. Note that many of the existing users are not candidates for the replacement, e.g.: (1) When the swap entry is not a pfn swap entry at all, or, (2) when we wanna keep the whole swp_offset but only change the swp type. For the latter, it can happen when fork() triggered on a write-migration swap entry pte, we may want to only change the migration type from write->read but keep the rest, so it's not "fetching PFN" but "changing swap type only". They're left aside so that when there're more information within the swp offset they'll be carried over naturally in those cases. Since at it, dropping hwpoison_entry_to_pfn() because that's exactly what the new swp_offset_pfn() is about. Signed-off-by: Peter Xu <[email protected]> Reviewed-by: "Huang, Ying" <[email protected]> nvbug: https://nvbugswb.nvidia.com/NVBugs5/redir.aspx?url=/4067536 (cherry picked from commit 0d206b5) Signed-off-by: Alistair Popple <[email protected]>
Carry over the dirty bit from pmd to pte when a huge pmd splits. It shouldn't be a correctness issue since when pmd_dirty() we'll have the page marked dirty anyway, however having dirty bit carried over helps the next initial writes of split ptes on some archs like x86. Reviewed-by: Huang Ying <[email protected]> Signed-off-by: Peter Xu <[email protected]> nvbug: https://nvbugswb.nvidia.com/NVBugs5/redir.aspx?url=/4067536 (cherry picked from commit 0ccf7f1) Signed-off-by: Alistair Popple <[email protected]>
When page migration happens, we always ignore the young/dirty bit settings in the old pgtable, and marking the page as old in the new page table using either pte_mkold() or pmd_mkold(), and keeping the pte clean. That's fine from functional-wise, but that's not friendly to page reclaim because the moving page can be actively accessed within the procedure. Not to mention hardware setting the young bit can bring quite some overhead on some systems, e.g. x86_64 needs a few hundreds nanoseconds to set the bit. The same slowdown problem to dirty bits when the memory is first written after page migration happened. Actually we can easily remember the A/D bit configuration and recover the information after the page is migrated. To achieve it, define a new set of bits in the migration swap offset field to cache the A/D bits for old pte. Then when removing/recovering the migration entry, we can recover the A/D bits even if the page changed. One thing to mention is that here we used max_swapfile_size() to detect how many swp offset bits we have, and we'll only enable this feature if we know the swp offset is big enough to store both the PFN value and the A/D bits. Otherwise the A/D bits are dropped like before. Signed-off-by: Peter Xu <[email protected]> Reviewed-by: "Huang, Ying" <[email protected]> nvbug: https://nvbugswb.nvidia.com/NVBugs5/redir.aspx?url=/4067536 (cherry picked from commit 2e34687) Signed-off-by: Alistair Popple <[email protected]>
We used to have swapfile_maximum_size() fetching a maximum value of swapfile size per-arch. As the caller of max_swapfile_size() grows, this patch introduce a variable "swapfile_maximum_size" and cache the value of old max_swapfile_size(), so that we don't need to calculate the value every time. Caching the value in swapfile_init() is safe because when reaching the phase we should have initialized all the relevant information. Here the major arch to take care of is x86, which defines the max swapfile size based on L1TF mitigation. Here both X86_BUG_L1TF or l1tf_mitigation should have been setup properly when reaching swapfile_init(). As a reference, the code path looks like this for x86: - start_kernel - setup_arch - early_cpu_init - early_identify_cpu --> setup X86_BUG_L1TF - parse_early_param - l1tf_cmdline --> set l1tf_mitigation - check_bugs - l1tf_select_mitigation --> set l1tf_mitigation - arch_call_rest_init - rest_init - kernel_init - kernel_init_freeable - do_basic_setup - do_initcalls --> calls swapfile_init() (initcall level 4) The swapfile size only depends on swp pte format on non-x86 archs, so caching it is safe too. Since at it, rename max_swapfile_size() to arch_max_swapfile_size() because arch can define its own function, so it's more straightforward to have "arch_" as its prefix. At the meantime, export swapfile_maximum_size to replace the old usages of max_swapfile_size(). Signed-off-by: Peter Xu <[email protected]> Reviewed-by: "Huang, Ying" <[email protected]> nvbug: https://nvbugswb.nvidia.com/NVBugs5/redir.aspx?url=/4067536 (cherry picked from commit be45a49) Signed-off-by: Alistair Popple <[email protected]>
Introduce a variable swap_migration_ad_supported to cache whether the arch supports swap migration A/D bits. Here one thing to mention is that SWP_MIG_TOTAL_BITS will internally reference the other macro MAX_PHYSMEM_BITS, which is a function call on x86 (constant on all the rest of archs). It's safe to reference it in swapfile_init() because when reaching here we're already during initcalls level 4 so we must have initialized 5-level pgtable for x86_64 (right after early_identify_cpu() finishes). - start_kernel - setup_arch - early_cpu_init - get_cpu_cap --> fetch from CPUID (including X86_FEATURE_LA57) - early_identify_cpu --> clear X86_FEATURE_LA57 (if early lvl5 not enabled (USE_EARLY_PGTABLE_L5)) - arch_call_rest_init - rest_init - kernel_init - kernel_init_freeable - do_basic_setup - do_initcalls --> calls swapfile_init() (initcall level 4) This should slightly speed up the migration swap entry handlings. Signed-off-by: Peter Xu <[email protected]> nvbug: https://nvbugswb.nvidia.com/NVBugs5/redir.aspx?url=/4067536 (cherry picked from commit 5154e60) Signed-off-by: Alistair Popple <[email protected]>
We use "unsigned long" to store a PFN in the kernel and phys_addr_t to store a physical address. On a 64bit system, both are 64bit wide. However, on a 32bit system, the latter might be 64bit wide. This is, for example, the case on x86 with PAE: phys_addr_t and PTEs are 64bit wide, while "unsigned long" only spans 32bit. The current definition of SWP_PFN_BITS without MAX_PHYSMEM_BITS misses that case, and assumes that the maximum PFN is limited by an 32bit phys_addr_t. This implies, that SWP_PFN_BITS will currently only be able to cover 4 GiB - 1 on any 32bit system with 4k page size, which is wrong. Let's rely on the number of bits in phys_addr_t instead, but make sure to not exceed the maximum swap offset, to not make the BUILD_BUG_ON() in is_pfn_swap_entry() unhappy. Note that swp_entry_t is effectively an unsigned long and the maximum swap offset shares that value with the swap type. For example, on an 8 GiB x86 PAE system with a kernel config based on Debian 11.5 (-> CONFIG_FLATMEM=y, CONFIG_X86_PAE=y), we will currently fail removing migration entries (remove_migration_ptes()), because mm/page_vma_mapped.c:check_pte() will fail to identify a PFN match as swp_offset_pfn() wrongly masks off PFN bits. For example, split_huge_page_to_list()->...->remap_page() will leave migration entries in place and continue to unlock the page. Later, when we stumble over these migration entries (e.g., via /proc/self/pagemap), pfn_swap_entry_to_page() will BUG_ON() because these migration entries shouldn't exist anymore and the page was unlocked. [ 33.067591] kernel BUG at include/linux/swapops.h:497! [ 33.067597] invalid opcode: 0000 [NVIDIA-BaseOS-6#1] PREEMPT SMP NOPTI [ 33.067602] CPU: 3 PID: 742 Comm: cow Tainted: G E 6.1.0-rc8+ #16 [ 33.067605] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014 [ 33.067606] EIP: pagemap_pmd_range+0x644/0x650 [ 33.067612] Code: 00 00 00 00 66 90 89 ce b9 00 f0 ff ff e9 ff fb ff ff 89 d8 31 db e8 48 c6 52 00 e9 23 fb ff ff e8 61 83 56 00 e9 b6 fe ff ff <0f> 0b bf 00 f0 ff ff e9 38 fa ff ff 3e 8d 74 26 00 55 89 e5 57 31 [ 33.067615] EAX: ee394000 EBX: 00000002 ECX: ee394000 EDX: 00000000 [ 33.067617] ESI: c1b0ded4 EDI: 00024a00 EBP: c1b0ddb4 ESP: c1b0dd68 [ 33.067619] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010246 [ 33.067624] CR0: 80050033 CR2: b7a00000 CR3: 01bbbd20 CR4: 00350ef0 [ 33.067625] Call Trace: [ 33.067628] ? madvise_free_pte_range+0x720/0x720 [ 33.067632] ? smaps_pte_range+0x4b0/0x4b0 [ 33.067634] walk_pgd_range+0x325/0x720 [ 33.067637] ? mt_find+0x1d6/0x3a0 [ 33.067641] ? mt_find+0x1d6/0x3a0 [ 33.067643] __walk_page_range+0x164/0x170 [ 33.067646] walk_page_range+0xf9/0x170 [ 33.067648] ? __kmem_cache_alloc_node+0x2a8/0x340 [ 33.067653] pagemap_read+0x124/0x280 [ 33.067658] ? default_llseek+0x101/0x160 [ 33.067662] ? smaps_account+0x1d0/0x1d0 [ 33.067664] vfs_read+0x90/0x290 [ 33.067667] ? do_madvise.part.0+0x24b/0x390 [ 33.067669] ? debug_smp_processor_id+0x12/0x20 [ 33.067673] ksys_pread64+0x58/0x90 [ 33.067675] __ia32_sys_ia32_pread64+0x1b/0x20 [ 33.067680] __do_fast_syscall_32+0x4c/0xc0 [ 33.067683] do_fast_syscall_32+0x29/0x60 [ 33.067686] do_SYSENTER_32+0x15/0x20 [ 33.067689] entry_SYSENTER_32+0x98/0xf1 Decrease the indentation level of SWP_PFN_BITS and SWP_PFN_MASK to keep it readable and consistent. [[email protected]: rely on sizeof(phys_addr_t) and min_t() instead] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: use "int" for comparison, as we're only comparing numbers < 64] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 0d206b5 ("mm/swap: add swp_offset_pfn() to fetch PFN from swap entry") Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Peter Xu <[email protected]> Reviewed-by: Yang Shi <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Andrea Arcangeli <[email protected]> Signed-off-by: Andrew Morton <[email protected]> nvbug: https://nvbugswb.nvidia.com/NVBugs5/redir.aspx?url=/4067536 (cherry picked from commit 630dc25) Signed-off-by: Alistair Popple <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
UVM uses migrate_vma functionality to handle some page faults. Prior to a fix in kernel v6.1 this would clear the PTE access flag leading to increased faults. This pull request is to backport required patches to implement the fix.