Help, how to support cross page hardware L2 prefetch in ARM core? [ARM][Prefetcher][TLB] #1028
-
Hi, all: for (AddrPriority& addr_prio : addresses) {
// Block align prefetch address
addr_prio.first = blockAddress(addr_prio.first);
if (!samePage(addr_prio.first, pfi.getAddr())) {
statsQueued.pfSpanPage += 1;
}
bool can_cross_page = (tlb != nullptr);
if (can_cross_page || samePage(addr_prio.first, pfi.getAddr())) {
PrefetchInfo new_pfi(pfi,addr_prio.first);
statsQueued.pfIdentified++;
DPRINTF(HWPrefetch, "Found a pf candidate addr: %#x, "
"inserting into prefetch queue.\n", new_pfi.getAddr());
// Create and insert the request
insert(pkt, new_pfi, addr_prio.second);
num_pfs += 1;
if (num_pfs == max_pfs) {
break;
}
} else {
DPRINTF(HWPrefetch, "Ignoring page crossing prefetch.\n");
}
} and I set some configuration to add a TLB in my own cache configuration file: if (system.cpu[i].l2.prefetch_on_access):
system.cpu[i].l2.prefetcher.registerTLB(system.cpu[i].mmu.l2_shared) Unfortunately, it is not work when I found the output: build/ARM/arch/arm/tlb.hh:287: panic: unimplemented
Memory Usage: 56109796 KBytes
Thread 1 "gem5.opt" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. and the void
translateTiming(const RequestPtr &req, ThreadContext *tc,
BaseMMU::Translation *translation,
BaseMMU::Mode mode) override
{
panic("unimplemented");
} My question is:
|
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 4 replies
-
I'm the gem5-GPT bot, here to assist with your queries about the gem5 simulator. Please note that while I aim to provide accurate information, my responses could be wrong or may have been slightly edited for clarity. You can access the gem5-GPT on chat.openai.com/g/g-o3y1y4t8J-gem5-gpt if you subscribe to chatGPT. It seems you're encountering a couple of distinct issues in your work with gem5, specifically when attempting to enhance the prefetcher in the L2 cache to support address prefetching across page boundaries. Let's address each of your questions: Unimplemented TLB Functions in ARM ArchitectureThe translateTiming() function in the src/arch/arm/tlb.hh file throwing a "panic: unimplemented" error indicates that this particular method is not implemented for the ARM architecture in your version of gem5 (based on 21.2). This is a common practice in gem5 when a feature is either architecture-specific or hasn't been developed yet for all architectures. In the context of ARM, certain features such as TLB management and memory translation might not be as fully fleshed out or may be handled differently compared to other architectures like x86. This could be due to a variety of reasons, including:
To resolve this, consider the following:
Supporting Cross-Page PrefetchingYour intention to modify the prefetcher to handle cross-page prefetching addresses involves ensuring that the prefetcher does not discard prefetch requests when they cross page boundaries. The provided code shows an attempt to integrate a TLB check to allow cross-page prefetching, but it seems like your approach is running into difficulties because the necessary TLB functionality is missing or unimplemented for ARM. To support wildly prefetching addresses, you might consider:
Final Recommendations
|
Beta Was this translation helpful? Give feedback.
-
Ping @giactra. Do you have any pointers here? |
Beta Was this translation helpful? Give feedback.
-
Hi @xfbingshan. Thanks for reporting this. A bit of history is required. As I mentioned earlier the translation logic was moved from the ArmTLB object to the ArmMMU. In the ArmTLB panics were added to make sure the old interface was not used anymore. |
Beta Was this translation helpful? Give feedback.
Hi @xfbingshan. Thanks for reporting this.
A bit of history is required.
gem5 didn't use to employ the concept of a MMU until 2020-2021. Whenever a translation was needed from the CPU (to translate virtual address), the cpu was calling translateAtomic/translatingTiming methods of the appropriate TLB (the DTLB for data accesses and the ITLB for instruction accesses) directly.
The need for being able to model a hierarchichal TLB setup (e.g. with a shared L2) required us to move most of the translation logic into the new MMU class 1, and to stop the CPU from interfacing directly with the TLB. Instead the CPU would hold a reference to the archietctural MMU object which would then be the trans…