Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parsing in vpu_count on workstation SKX #351

Merged
merged 10 commits into from
Jan 6, 2020

Conversation

loveshack
Copy link
Contributor

This correctly treats
Intel(R) Xeon(R) W-2123 CPU @ 3.60GHz

Copy link
Member

@devinamatthews devinamatthews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since all Xeon W have 2 VPUs, we should probably just check the "W" part and return 2. The current code doesn't handle new processors W-32XX, W-22XX, and W-3175X.

@loveshack
Copy link
Contributor Author

loveshack commented Oct 7, 2019 via email

@devinamatthews
Copy link
Member

Ah, 2102 and 2104 are conveniently omitted from https://ark.intel.com/content/www/us/en/ark/products/series/125035/intel-xeon-w-processor.html.

It would be great if you could fix up the logic. The only other way to detect is to run and time two loops (highly unrolled): one over fma only and one with fma+permute. If they take the same amount of time then there are 2 VPUs.

@loveshack
Copy link
Contributor Author

loveshack commented Oct 8, 2019 via email

@jeffhammond
Copy link
Member

If you want to use the empirical code, just build a standalone binary and run it during configure. Running the empirical test as part of a BLAS library is gross (which is why I created my repo in the first place).

@jeffhammond
Copy link
Member

There are also, for instance, i9s with avx512 but no fma count documented; does that mean they don't have fma?

I asked the product marketing owner to get the answer.

If a single unit isn't useful for GEMM, I wonder what it is useful for.

It's complicated. 1 FMA doesn't help GEMM because the frequency is lower with 1x512 than 2x256 (don't ask me why). AVX3 (aka AVX-512) has other uses besides GEMM where it has upside versus AVX2.

@devinamatthews
Copy link
Member

I guess if @jeffhammond is happy tracking down the specs for all future AVX-512 products then we can just use his version. @fgvanzee ?

@loveshack
Copy link
Contributor Author

loveshack commented Oct 8, 2019 via email

@loveshack
Copy link
Contributor Author

loveshack commented Oct 8, 2019 via email

@fgvanzee
Copy link
Member

fgvanzee commented Oct 8, 2019

If you want to use the empirical code, just build a standalone binary and run it during configure. Running the empirical test as part of a BLAS library is gross (which is why I created my repo in the first place).

They are both gross (albeit different magnitudes of grossness). I'd rather try to code all of the model number logic if that is just a few tweaks away from what we have already.

It's complicated. 1 FMA doesn't help GEMM because the frequency is lower with 1x512 than 2x256 (don't ask me why). AVX3 (aka AVX-512) has other uses besides GEMM where it has upside versus AVX2.

I agree that all BLIS needs to care about is whether AVX-512 is supported. Thanks for reminding us of this, Jeff.

I guess if @jeffhammond is happy tracking down the specs for all future AVX-512 products then we can just use his version. @fgvanzee ?

Agreed. (I assume by "his version" you mean the code that is currently in BLIS?)

I see two paths forward:

  • We try to finish Dave's PR to the best of our abilities in case VPU count ever does matter to us in the future;
  • We abort the PR.

Am I missing anything?

@devinamatthews
Copy link
Member

@fgvanzee I meant copy https://github.com/jeffhammond/vpu-count/blob/master/vpu-count.c over periodically and adjust the interface as needed. It is MIT licensed so shouldn't be an issue.

…nment

Intended particularly for diagnosing mis-selection of SKX through
unknown, or incorrect, number of VPUs.
@loveshack
Copy link
Contributor Author

loveshack commented Oct 9, 2019 via email

@devinamatthews
Copy link
Member

I think I've got further with that than Jeff has, per an issue against
his repo.

Great! What I meant is that I am happy for anyone but me to keep it up to date 😄.

@loveshack
Copy link
Contributor Author

loveshack commented Oct 9, 2019 via email

@fgvanzee
Copy link
Member

fgvanzee commented Jan 1, 2020

@loveshack Dave, I apologize for letting this issue slip through the cracks. I think what happened here is that I had trouble following along with some of the conversation, which caused me to place the issue on the back burner until everyone with an interest in it worked out their differences / came to a consensus, but then I failed to realize when that pan was done simmering. :)

I'll start taking a look at this shortly, and hopefully others can chime in to help us get this PR resolved.

fgvanzee and others added 4 commits January 1, 2020 16:37
Details:
- Moved architecture/sub-config logging-related code from bli_cpuid.c
  to bli_arch.c, tweaked names, and added more set/get layering.
- Tweaked log messages output from bli_cpuid_is_skx() in bli_cpuid.c.
- Content, whitespace changes to new bullet in HardwareSupport.md that
  relates to single-VPU Skylake-Xs.
@fgvanzee
Copy link
Member

fgvanzee commented Jan 3, 2020

@loveshack I resolved the trivial conflict in the copyright portion of the license header to bli_cpuid.c. I think we're nearly done.

@devinamatthews If you have a moment, please comment on this before we merge.

@fgvanzee fgvanzee merged commit f391b3e into flame:master Jan 6, 2020
pradeeptrgit pushed a commit to amd/blis that referenced this pull request Jun 30, 2020
* Fix parsing in vpu_count on workstation SKX

* Document Skylake-X as Haswell for single FMA

* Update vpu_count for Skylake and Cascade Lake models

* Support printing the configuration selected, controlled by the environment

Intended particularly for diagnosing mis-selection of SKX through
unknown, or incorrect, number of VPUs.

* Move bli_log outside the cpp condition, and use it where intended

* Add Fixme comment (Skylake D)

* Mostly superficial edits to commits towards flame#351.

Details:
- Moved architecture/sub-config logging-related code from bli_cpuid.c
  to bli_arch.c, tweaked names, and added more set/get layering.
- Tweaked log messages output from bli_cpuid_is_skx() in bli_cpuid.c.
- Content, whitespace changes to new bullet in HardwareSupport.md that
  relates to single-VPU Skylake-Xs.

* Fix comment typos

Co-authored-by: Field G. Van Zee <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants