Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

codegen optimizations for unions #21279

Merged
merged 2 commits into from
Apr 6, 2017
Merged

codegen optimizations for unions #21279

merged 2 commits into from
Apr 6, 2017

Conversation

vtjnash
Copy link
Sponsor Member

@vtjnash vtjnash commented Apr 4, 2017

This seems to generate code that llvm is better at optimizing. Will definitely need to check nanosoldier though to see if this seems to trigger any regressions elsewhere.

@vtjnash vtjnash added compiler:codegen Generation of LLVM IR and native code performance Must go faster labels Apr 4, 2017
@vtjnash vtjnash requested a review from Keno April 4, 2017 22:46
@Keno
Copy link
Member

Keno commented Apr 4, 2017

Instcombine is quite expensive. Can you get away with instsimplify for your purpose?

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Apr 5, 2017

Ah, I was assuming it was cheap since we call it very often. I don't actually need it.

@ararslan
Copy link
Member

ararslan commented Apr 5, 2017

Looks like this broke the cfunction round-trip test on x86-64

SROA likes this form better

Also, since many of these loop variables are loop-dependent,
it helps to run the loop structure analysis passes twice
@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Apr 5, 2017

@nanosoldier runbenchmarks(ALL, vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@martinholters
Copy link
Member

Are the "simd" regressions real? I don't remember those as particularly noisy.

@ararslan
Copy link
Member

ararslan commented Apr 5, 2017

We can always run again to see if the results are the same.

@nanosoldier runbenchmarks(ALL, vs=":master")

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Apr 5, 2017

They seem to be "quasi-real", but the reason is hilarious. With the extra passes, LLVM is able to notice that the loop is computing the number 0 and can elid all of the SIMD computational work and replace it with a simple memset. So we end up profiling the quality of the system memset function 😆

@ararslan
Copy link
Member

ararslan commented Apr 5, 2017

That is pretty amusing. Why would that cause a regression though? The system's memset could actually be slower than the full SIMD computations?

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Apr 5, 2017

The system's memset could actually be slower than the full SIMD computations?

Yes. It is entirely reliant on the glibc version supporting the max vector width for the machine. The bottleneck is waiting for memory, making the SIMD calculations essentially "free".

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@vtjnash vtjnash merged commit 11682d8 into master Apr 6, 2017
@vtjnash vtjnash deleted the jn/codegen-opt-unions branch April 6, 2017 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants