Help, how to support cross page hardware L2 prefetch in ARM core? [ARM][Prefetcher][TLB] #1028

xfbingshan · 2024-04-15T08:32:23Z

xfbingshan
Apr 15, 2024

Hi, all:
My GEM5 version is based on 21.2.
I just want to add a prefetcher in L2 Cache, while output address of the prefetcher can be cross page wildy, the defaul prefetcher configuration is not support of cross page, and the prefetching address can not send to memory.
queued.cc

for (AddrPriority& addr_prio : addresses) {

        // Block align prefetch address
        addr_prio.first = blockAddress(addr_prio.first);

        if (!samePage(addr_prio.first, pfi.getAddr())) {
            statsQueued.pfSpanPage += 1;
        }

        bool can_cross_page = (tlb != nullptr);
        if (can_cross_page || samePage(addr_prio.first, pfi.getAddr())) {
            PrefetchInfo new_pfi(pfi,addr_prio.first);
            statsQueued.pfIdentified++;
            DPRINTF(HWPrefetch, "Found a pf candidate addr: %#x, "
                    "inserting into prefetch queue.\n", new_pfi.getAddr());
            // Create and insert the request
            insert(pkt, new_pfi, addr_prio.second);
            num_pfs += 1;
            if (num_pfs == max_pfs) {
                break;
            }
        } else {
            DPRINTF(HWPrefetch, "Ignoring page crossing prefetch.\n");
        }
    }

and I set some configuration to add a TLB in my own cache configuration file:

if (system.cpu[i].l2.prefetch_on_access):
    system.cpu[i].l2.prefetcher.registerTLB(system.cpu[i].mmu.l2_shared)

Unfortunately, it is not work when I found the output:

build/ARM/arch/arm/tlb.hh:287: panic: unimplemented
Memory Usage: 56109796 KBytes

Thread 1 "gem5.opt" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.

and the src/arch/arm/tlb.hh code is:

void
    translateTiming(const RequestPtr &req, ThreadContext *tc,
                    BaseMMU::Translation *translation,
                    BaseMMU::Mode mode) override
    {
        panic("unimplemented");
    }

My question is:

Why the functions of TLB such as translateTiming() are unimplemented only in arm archtecture, is there any restriction I don't know?
Is there any shortcuts to support wildy prefetching address to issue?

Answered by giactra

Apr 23, 2024

Hi @xfbingshan. Thanks for reporting this.

A bit of history is required.
gem5 didn't use to employ the concept of a MMU until 2020-2021. Whenever a translation was needed from the CPU (to translate virtual address), the cpu was calling translateAtomic/translatingTiming methods of the appropriate TLB (the DTLB for data accesses and the ITLB for instruction accesses) directly.
The need for being able to model a hierarchichal TLB setup (e.g. with a shared L2) required us to move most of the translation logic into the new MMU class 1, and to stop the CPU from interfacing directly with the TLB. Instead the CPU would hold a reference to the archietctural MMU object which would then be the trans…

View full answer

ivanaamit · 2024-04-22T16:12:53Z

ivanaamit
Apr 22, 2024
Maintainer

I'm the gem5-GPT bot, here to assist with your queries about the gem5 simulator. Please note that while I aim to provide accurate information, my responses could be wrong or may have been slightly edited for clarity. You can access the gem5-GPT on chat.openai.com/g/g-o3y1y4t8J-gem5-gpt if you subscribe to chatGPT.

It seems you're encountering a couple of distinct issues in your work with gem5, specifically when attempting to enhance the prefetcher in the L2 cache to support address prefetching across page boundaries. Let's address each of your questions:

Unimplemented TLB Functions in ARM Architecture

The translateTiming() function in the src/arch/arm/tlb.hh file throwing a "panic: unimplemented" error indicates that this particular method is not implemented for the ARM architecture in your version of gem5 (based on 21.2). This is a common practice in gem5 when a feature is either architecture-specific or hasn't been developed yet for all architectures.

In the context of ARM, certain features such as TLB management and memory translation might not be as fully fleshed out or may be handled differently compared to other architectures like x86. This could be due to a variety of reasons, including:

Priority and Usage: ARM support in gem5 may not have had as high a development priority at the time, or specific use cases leading to lesser attention to these areas.
Community Contributions: gem5 is community-driven, and certain features depend on the contributions from researchers and developers with interest in specific architectures.

To resolve this, consider the following:

Check for Updates: Ensure that there are no newer versions of gem5 or community patches that implement the required functionality for ARM.
Implement the Method: If you are comfortable with modifying gem5's source code, you could try to implement translateTiming() yourself.
Community and Support Forums: Engage with the gem5 community via mailing lists or forums. Other users or the developers might have encountered and solved similar issues.

Supporting Cross-Page Prefetching

Your intention to modify the prefetcher to handle cross-page prefetching addresses involves ensuring that the prefetcher does not discard prefetch requests when they cross page boundaries. The provided code shows an attempt to integrate a TLB check to allow cross-page prefetching, but it seems like your approach is running into difficulties because the necessary TLB functionality is missing or unimplemented for ARM.

To support wildly prefetching addresses, you might consider:

Modifying Prefetcher Logic: Adapt the prefetching logic to be more tolerant of cross-page requests. This could involve removing or adjusting the checks that prevent cross-page prefetching if TLB support is not viable.
Using a Simpler Heuristic: If detailed address translation via a TLB is not critical for your use case, you could implement a heuristic-based approach that allows prefetching across pages without detailed translation checks.

Final Recommendations

Debugging and Logs: Utilize DPRINTF or similar debugging tools to trace the execution and understand where exactly the failures occur.
Documentation and Source Code: Review the latest gem5 documentation and source code to better understand how ARM architecture is handled and if there have been any recent improvements or relevant changes.

1 reply

xfbingshan Apr 23, 2024
Author

Thanks.

Update:

I have made some attempts that the prefetcher logic is modified, because I can use the virtual address as physical address directly, so I simplify this problem: the predicted cross page address is inserted in to prefetch queue pfq instead of missing TLB prefetch queue pfqMissingTranslation.
New question：I want to modify the page size from default 4KB to 2MB to reduce the number of across page predicted address but failed, I found that code:

namespace ArmISA
{
    const Addr PageShift = 12;
    const Addr PageBytes = 1ULL << PageShift;

} // namespace ArmISA

I changed like:

namespace ArmISA
{
    const Addr PageShift = 21;
    const Addr PageBytes = 1ULL << PageShift;

} // namespace ArmISA

the error is:

build/ARM/sim/vma.cc:117: void gem5::VMA::sanityCheck(): Assertion `(_addrRange.start() % _pageBytes) == 0' failed.

I am wondering what the correct modification is.

andysan · 2024-04-23T16:10:06Z

andysan
Apr 23, 2024
Maintainer

Ping @giactra. Do you have any pointers here?

0 replies

giactra · 2024-04-23T17:11:01Z

giactra
Apr 23, 2024
Maintainer

Hi @xfbingshan. Thanks for reporting this.

A bit of history is required.
gem5 didn't use to employ the concept of a MMU until 2020-2021. Whenever a translation was needed from the CPU (to translate virtual address), the cpu was calling translateAtomic/translatingTiming methods of the appropriate TLB (the DTLB for data accesses and the ITLB for instruction accesses) directly.
The need for being able to model a hierarchichal TLB setup (e.g. with a shared L2) required us to move most of the translation logic into the new MMU class 1, and to stop the CPU from interfacing directly with the TLB. Instead the CPU would hold a reference to the archietctural MMU object which would then be the translation provider to the CPU.

As I mentioned earlier the translation logic was moved from the ArmTLB object to the ArmMMU. In the ArmTLB panics were added to make sure the old interface was not used anymore.
What I definitely missed was the translation usage from cross page prefetcher which was still using the TLB interface. This got fixed later on 2 and it has been integrated in more modern releases.

3 replies

xfbingshan Apr 24, 2024
Author

Thanks a lot. 😄
I was negligent. I only checked the newest code in src/arch/arm/tlb.hh but missed src/mem/cache/prefetch/queued.cc which is more important, I got it and I will try to apply the patch on my code to see what happens.

BTW, any suggestion for my new problem?

modify the page size from default 4KB to 2MB to reduce the number of across page predicted address but failed

giactra Apr 25, 2024
Maintainer

Are you running in SE or FS mode?

xfbingshan Apr 26, 2024
Author

SE mode

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gem5

Help, how to support cross page hardware L2 prefetch in ARM core? [ARM][Prefetcher][TLB] #1028

{{title}}

Replies: 3 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

gem5

Help, how to support cross page hardware L2 prefetch in ARM core? [ARM][Prefetcher][TLB] #1028

xfbingshan Apr 15, 2024

Replies: 3 comments · 4 replies

ivanaamit Apr 22, 2024 Maintainer

Unimplemented TLB Functions in ARM Architecture

Supporting Cross-Page Prefetching

Final Recommendations

xfbingshan Apr 23, 2024 Author

andysan Apr 23, 2024 Maintainer

giactra Apr 23, 2024 Maintainer

xfbingshan Apr 24, 2024 Author

giactra Apr 25, 2024 Maintainer

xfbingshan Apr 26, 2024 Author

xfbingshan
Apr 15, 2024

Replies: 3 comments 4 replies

ivanaamit
Apr 22, 2024
Maintainer

xfbingshan Apr 23, 2024
Author

andysan
Apr 23, 2024
Maintainer

giactra
Apr 23, 2024
Maintainer

xfbingshan Apr 24, 2024
Author

giactra Apr 25, 2024
Maintainer

xfbingshan Apr 26, 2024
Author