Fix parsing in vpu_count on workstation SKX #351

loveshack · 2019-10-07T15:33:10Z

This correctly treats
Intel(R) Xeon(R) W-2123 CPU @ 3.60GHz

devinamatthews

Since all Xeon W have 2 VPUs, we should probably just check the "W" part and return 2. The current code doesn't handle new processors W-32XX, W-22XX, and W-3175X.

loveshack · 2019-10-07T20:35:58Z

Since all Xeon W have 2 VPUs,

Not according to https://ark.intel.com/content/www/us/en/ark/products/125038/intel-xeon-w-2102-processor-8-25m-cache-2-90-ghz.html See also https://github.com/jeffhammond/vpu-count/blob/master/vpu-count.c, but that's broken under Linux 4.19, at least. I now realize both BLIS' and Jeff's code fail according to https://ark.intel.com/content/www/us/en/ark/products/codename/37572/skylake.html e.g. i9 and i7 with 2 FMA. The logic clearly needs revisiting. Is there a more systematic approach than going through the product list (which doesn't have W210x)? Also Wikipedia notes that Gold 5122 has two units, not the one that would be identified currently.

devinamatthews · 2019-10-07T20:48:12Z

Ah, 2102 and 2104 are conveniently omitted from https://ark.intel.com/content/www/us/en/ark/products/series/125035/intel-xeon-w-processor.html.

It would be great if you could fix up the logic. The only other way to detect is to run and time two loops (highly unrolled): one over fma only and one with fma+permute. If they take the same amount of time then there are 2 VPUs.

loveshack · 2019-10-08T11:00:23Z

Ah, 2102 and 2104 are conveniently omitted from https://ark.intel.com/content/www/us/en/ark/products/series/125035/intel-xeon-w-processor.html. It would be great if you could fix up the logic.

I'm not sure how to do it reliably without a systematic technique or at least assuming the answer is two for future processors. I don't know how complete Jeff's code is, but maybe it's best just to use that if you're happy with the licence notice. I mis-spoke about it not recognizing i9/i7 -- maybe I was looking at an old version. It does miss D- series, at least, but it looks as if you want those counted as haswell. There are also, for instance, i9s with avx512 but no fma count documented; does that mean they don't have fma? [If a single unit isn't useful for GEMM, I wonder what it is useful for. I haven't researched that, and really can't keep up...]

The only other way to detect is to run and time two loops (highly unrolled): one over fma only and one with fma+permute. If they take the same amount of time then there are 2 VPUs.

Is that a reasonable way to do it -- as a fallback? -- for dispatch at run time? (The code at https://github.com/jeffhammond/vpu-count/blob/master/empirical.c is icc-specific, and isn't usable without a licence.)

jeffhammond · 2019-10-08T14:33:02Z

If you want to use the empirical code, just build a standalone binary and run it during configure. Running the empirical test as part of a BLAS library is gross (which is why I created my repo in the first place).

jeffhammond · 2019-10-08T14:36:38Z

There are also, for instance, i9s with avx512 but no fma count documented; does that mean they don't have fma?

I asked the product marketing owner to get the answer.

If a single unit isn't useful for GEMM, I wonder what it is useful for.

It's complicated. 1 FMA doesn't help GEMM because the frequency is lower with 1x512 than 2x256 (don't ask me why). AVX3 (aka AVX-512) has other uses besides GEMM where it has upside versus AVX2.

devinamatthews · 2019-10-08T15:15:00Z

I guess if @jeffhammond is happy tracking down the specs for all future AVX-512 products then we can just use his version. @fgvanzee ?

loveshack · 2019-10-08T15:35:59Z

If you want to use the empirical code, just build a standalone binary and run it during configure. Running the empirical test as part of a BLAS library is gross (which is why I created my repo in the first place).

The code isn't distributable without a licence (or buildable without icc). I'd be more interested in the x86_64 target than auto, but I agree that's probably not something you want to do at run time, but I wonder if something like that is better than returning inferior results for suitable hardware. (I don't know how long it takes.) The repo is obviously helpful, thanks. Is it known what MKL does? I assume it has to do the same dance if the hardware doesn't report the info.

loveshack · 2019-10-08T15:39:23Z

It's complicated. 1 FMA doesn't help GEMM because the frequency is lower with 1x512 than 2x256 (don't ask me why). AVX3 (aka AVX-512) has other uses besides GEMM where it has upside versus AVX2.

Yes, I realize it's complicated... I guess this isn't the place for discussion. Anyhow, I hadn't realized a single unit ran slower, thanks.

fgvanzee · 2019-10-08T16:04:21Z

If you want to use the empirical code, just build a standalone binary and run it during configure. Running the empirical test as part of a BLAS library is gross (which is why I created my repo in the first place).

They are both gross (albeit different magnitudes of grossness). I'd rather try to code all of the model number logic if that is just a few tweaks away from what we have already.

It's complicated. 1 FMA doesn't help GEMM because the frequency is lower with 1x512 than 2x256 (don't ask me why). AVX3 (aka AVX-512) has other uses besides GEMM where it has upside versus AVX2.

I agree that all BLIS needs to care about is whether AVX-512 is supported. Thanks for reminding us of this, Jeff.

I guess if @jeffhammond is happy tracking down the specs for all future AVX-512 products then we can just use his version. @fgvanzee ?

Agreed. (I assume by "his version" you mean the code that is currently in BLIS?)

I see two paths forward:

We try to finish Dave's PR to the best of our abilities in case VPU count ever does matter to us in the future;
We abort the PR.

Am I missing anything?

devinamatthews · 2019-10-08T16:17:03Z

@fgvanzee I meant copy https://github.com/jeffhammond/vpu-count/blob/master/vpu-count.c over periodically and adjust the interface as needed. It is MIT licensed so shouldn't be an issue.

…nment Intended particularly for diagnosing mis-selection of SKX through unknown, or incorrect, number of VPUs.

loveshack · 2019-10-09T16:29:47Z

I agree that all BLIS needs to care about is whether AVX-512 is supported. Thanks for reminding us of this, Jeff.

?? If that's the case, you don't need to check for multiple FMA units, (conditional on avx512) which Jeff says is necessary.

> I guess if @jeffhammond is happy tracking down the specs for all > future AVX-512 products then we can just use his version. @fgvanzee > ?

I think I've got further with that than Jeff has, per an issue against his repo. Also, the parsing in BLIS actually looks more robust, given the change I needed for W- compared with what Jeff had reported.

devinamatthews · 2019-10-09T16:31:08Z

I think I've got further with that than Jeff has, per an issue against
his repo.

Great! What I meant is that I am happy for anyone but me to keep it up to date 😄.

loveshack · 2019-10-09T16:31:56Z

It sounds as if you don't want it, but as I'd done most of the work already, I pushed changes to my branch to update the models supported and allow reporting the number of VPUs chosen and architecture selected generally. That might save someone effort, but I may well have made mistakes, especially as I didn't try to scrape the ARK listings. I assumed ranges of models similar to how it was done already.

fgvanzee · 2020-01-01T19:17:11Z

@loveshack Dave, I apologize for letting this issue slip through the cracks. I think what happened here is that I had trouble following along with some of the conversation, which caused me to place the issue on the back burner until everyone with an interest in it worked out their differences / came to a consensus, but then I failed to realize when that pan was done simmering. :)

I'll start taking a look at this shortly, and hopefully others can chime in to help us get this PR resolved.

Details: - Moved architecture/sub-config logging-related code from bli_cpuid.c to bli_arch.c, tweaked names, and added more set/get layering. - Tweaked log messages output from bli_cpuid_is_skx() in bli_cpuid.c. - Content, whitespace changes to new bullet in HardwareSupport.md that relates to single-VPU Skylake-Xs.

fgvanzee · 2020-01-03T20:10:23Z

@loveshack I resolved the trivial conflict in the copyright portion of the license header to bli_cpuid.c. I think we're nearly done.

@devinamatthews If you have a moment, please comment on this before we merge.

* Fix parsing in vpu_count on workstation SKX * Document Skylake-X as Haswell for single FMA * Update vpu_count for Skylake and Cascade Lake models * Support printing the configuration selected, controlled by the environment Intended particularly for diagnosing mis-selection of SKX through unknown, or incorrect, number of VPUs. * Move bli_log outside the cpp condition, and use it where intended * Add Fixme comment (Skylake D) * Mostly superficial edits to commits towards flame#351. Details: - Moved architecture/sub-config logging-related code from bli_cpuid.c to bli_arch.c, tweaked names, and added more set/get layering. - Tweaked log messages output from bli_cpuid_is_skx() in bli_cpuid.c. - Content, whitespace changes to new bullet in HardwareSupport.md that relates to single-VPU Skylake-Xs. * Fix comment typos Co-authored-by: Field G. Van Zee <[email protected]>

Fix parsing in vpu_count on workstation SKX

f811608

loveshack mentioned this pull request Oct 7, 2019

Workstation SKX is mis-identified #352

Open

devinamatthews reviewed Oct 7, 2019

View reviewed changes

Document Skylake-X as Haswell for single FMA

30de168

loveshack added 2 commits October 9, 2019 11:21

Update vpu_count for Skylake and Cascade Lake models

8944b6d

Support printing the configuration selected, controlled by the enviro…

94b34d3

…nment Intended particularly for diagnosing mis-selection of SKX through unknown, or incorrect, number of VPUs.

loveshack mentioned this pull request Oct 9, 2019

W- series mis-identified jeffhammond/vpu-count#2

Closed

loveshack added 2 commits October 10, 2019 12:26

Move bli_log outside the cpp condition, and use it where intended

5597169

Add Fixme comment (Skylake D)

822ddf9

fgvanzee and others added 4 commits January 1, 2020 16:37

Fix comment typos

339f5d3

Merge branch 'vpu_count' of github.com:loveshack/blis into vpu_count

314e767

Merge branch 'master' into vpu_count

2f6c94a

fgvanzee merged commit f391b3e into flame:master Jan 6, 2020

decandia50 mentioned this pull request Oct 9, 2020

Ability to set CPU configuration at runtime #451

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix parsing in vpu_count on workstation SKX #351

Fix parsing in vpu_count on workstation SKX #351

loveshack commented Oct 7, 2019

devinamatthews left a comment

loveshack commented Oct 7, 2019 via email

devinamatthews commented Oct 7, 2019

loveshack commented Oct 8, 2019 via email

jeffhammond commented Oct 8, 2019

jeffhammond commented Oct 8, 2019

devinamatthews commented Oct 8, 2019

loveshack commented Oct 8, 2019 via email

loveshack commented Oct 8, 2019 via email

fgvanzee commented Oct 8, 2019

devinamatthews commented Oct 8, 2019

loveshack commented Oct 9, 2019 via email

devinamatthews commented Oct 9, 2019

loveshack commented Oct 9, 2019 via email

fgvanzee commented Jan 1, 2020

fgvanzee commented Jan 3, 2020

Fix parsing in vpu_count on workstation SKX #351

Fix parsing in vpu_count on workstation SKX #351

Conversation

loveshack commented Oct 7, 2019

devinamatthews left a comment

Choose a reason for hiding this comment

loveshack commented Oct 7, 2019 via email

devinamatthews commented Oct 7, 2019

loveshack commented Oct 8, 2019 via email

jeffhammond commented Oct 8, 2019

jeffhammond commented Oct 8, 2019

devinamatthews commented Oct 8, 2019

loveshack commented Oct 8, 2019 via email

loveshack commented Oct 8, 2019 via email

fgvanzee commented Oct 8, 2019

devinamatthews commented Oct 8, 2019

loveshack commented Oct 9, 2019 via email

devinamatthews commented Oct 9, 2019

loveshack commented Oct 9, 2019 via email

fgvanzee commented Jan 1, 2020

fgvanzee commented Jan 3, 2020