Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpuinfo returns 0 for cpuinfo_get_l2_caches_count on some devices #4654

Open
ScottTodd opened this issue Jan 28, 2021 · 8 comments
Open

cpuinfo returns 0 for cpuinfo_get_l2_caches_count on some devices #4654

ScottTodd opened this issue Jan 28, 2021 · 8 comments
Labels
bug 🐞 Something isn't working hal/cpu Runtime Host/CPU-based HAL backend help wanted Extra attention is needed

Comments

@ScottTodd
Copy link
Member

ScottTodd commented Jan 28, 2021

Discord discussion: https://discord.com/channels/689900678990135345/689906000043573354/804488403936215090

If cpuinfo_get_l2_caches_count() returns 0 on some machines, so our code that tries to pick the number of workers based on the l2 cache count bails to a single threaded fallback:

https://github.com/google/iree/blob/50d9823218605f1707abb375419df47c5d2ef28c/iree/task/topology.c#L356-L378

@ScottTodd ScottTodd added bug 🐞 Something isn't working hal/cpu Runtime Host/CPU-based HAL backend labels Jan 28, 2021
@benvanik
Copy link
Collaborator

@benvanik benvanik changed the title Dylib driver creation fails on devices which report zero l2 caches through cpuinfo cpuinfo returns 0 for cpuinfo_get_l2_caches_count on some devices Jan 28, 2021
@benvanik
Copy link
Collaborator

Leaving this here for now as we are working around it by bailing to a fallback of 1 thread when this happens. If someone who has a machine that returns this they can debug on/file an upstream cpuinfo bug/etc that'd be helpful. Otherwise you get 1 thread :)

@benvanik benvanik removed their assignment Jan 28, 2021
@benvanik benvanik added the help wanted Extra attention is needed label Jan 28, 2021
@bjacob
Copy link
Contributor

bjacob commented Jan 29, 2021

Notes:

  • It is common even for real CPUs (not just emulators) on the ARM architecture to have no L2 cache, either because they skip from L1 to L3 (typical on some phone-class cpus) or because they have nothing at all beyond L1 (for some microcontrolled CPUs).
  • Of course, when running on emulator, it's even more unsurprising that there would be 0 L2 cache.

TLDR: this seems normal at both the qemu and cpuinfo level, and will happen not only on qemu.

@benvanik
Copy link
Collaborator

This is super useful information @bjacob - thanks! The current strategy (heh "strategy" is a stretch) is just a placeholder anyway - would love to sync more about reasonable defaults/specializations/etc and I was mostly just waiting until mahesh's queue is flushed with the linalg on tensors work so we have some parallel workloads and the CPU threading lands in the HAL rewrite so we'd have something concrete to talk about.

@bjacob
Copy link
Contributor

bjacob commented Jan 29, 2021

no problem! whenever you want to get back to this, you could take a look at this code that I wrote with much help from Marat for ruy's needs --- It's not exactly the same as you're doing, but it does wrestle with a similar issue of using cpuinfo distinguishing shared vs non-shared ("local") caches:
https://github.com/google/ruy/blob/2887692065c38ef6617f423feafc6b69dd0a0681/ruy/cpuinfo.cc#L42-L83
the key part is this condition, which Marat wrote down for me:
https://github.com/google/ruy/blob/2887692065c38ef6617f423feafc6b69dd0a0681/ruy/cpuinfo.cc#L59-L63
With this logic, we detect which cache is effectively the last level of cache that is local to each processor, which would indeed be the L2 cache on a majority of current CPUs but would be the L1 cache when there is no L2 --- i.e. avoiding making any assumptions about level N cache being special for any particular value of N.

@benvanik
Copy link
Collaborator

That's fantastic code and effectively what I was reaching for when I originally wrote this and then punted on. I'll see if I can adapt that. Did you find any way to test that besides actually grabbing a device with certain characteristics? (it'd be cool if cpuinfo could mock out certain devices for testing - like "pretend you are X" - maybe it can?)

@bjacob
Copy link
Contributor

bjacob commented Jan 29, 2021

Also note about ARM architecture CPUs: in addition to the above-mentioned case where there is no L2 cache, there are other CPUs with only L1 and L2, with the L2 cache being shared across cores!

Maybe ask Marat directly about cpuinfo mocking - I don't know personally.

@bjacob
Copy link
Contributor

bjacob commented Jan 29, 2021

Ah, cpuinfo does support mocking: in its public include/ directory, right besides cpuinfo.h, you got cpuinfo-mock.h:
https://github.com/pytorch/cpuinfo/blob/master/include/cpuinfo-mock.h

benvanik added a commit that referenced this issue Jan 29, 2021
Fallback to 1 thread for when cpuinfo_get_cores_count()==0 and core
count when cpuinfo_get_l2_caches_count()==0.
Issue #4654 is tracking making this better.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working hal/cpu Runtime Host/CPU-based HAL backend help wanted Extra attention is needed
Projects
No open projects
Status: No status
Development

No branches or pull requests

3 participants