Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix computation of cache-line size on Power #6515

Merged
merged 1 commit into from
May 16, 2022

Conversation

keithc-ca
Copy link
Member

@keithc-ca keithc-ca commented May 12, 2022

We need to inform the compiler that our uses of dcbz clobber memory, otherwise the optimizer can reasonably assume, e.g. in omrcpu_startup(), that buf[0] == 255 at the beginning of the for loop.

:"r"((void *) &buf[512]));
memset(buf, 255, sizeof(buf));

__asm__ __volatile__(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect the __volatile__ isn't needed. The port library copy doesn't use it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nm, I see you just added it here. I guess it doesn't hurt.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was used in both places in j9memclr.cpp and won't hurt. In fact, it may save us from a future compiler optimization.

@pshipton
Copy link
Member

jenkins build all

@pshipton
Copy link
Member

pshipton commented May 16, 2022

@jdmpapin @dsouzai is there a committer that can look at this?

@jdmpapin
Copy link
Contributor

The changes LGTM, but can the cache line size detection be factored out into a single place?

@keithc-ca
Copy link
Member Author

The changes LGTM, but can the cache line size detection be factored out into a single place?

I'll look into whether there's a clear path to that.

@keithc-ca
Copy link
Member Author

Updated as suggested, reusing getCacheLineSize() in omrcpu_startup() and using uint32_t as the storage type for the cache line size everywhere.

@jdmpapin
Copy link
Contributor

The description of the functional fix (i.e. the memory clobber) seems to have gone missing from the commit message. It's probably more important now than it was before, since there are more changes

The compiler needs to be told that our uses of dcbz clobber memory,
otherwise the optimizer can reasonably assume in getCacheLineSize(),
that each element of buf still has the value (255) assigned by memset().

* reuse getCacheLineSize() in omrcpu_startup()
* use uint32_t consistently for cache line size

Signed-off-by: Keith W. Campbell <[email protected]>
@keithc-ca
Copy link
Member Author

That description was never in the commit message itself (only in the description here); it's now also in the commit message.

@jdmpapin
Copy link
Contributor

Jenkins build all

@jdmpapin jdmpapin merged commit c67d754 into eclipse:master May 16, 2022
@keithc-ca keithc-ca deleted the ppc_cache_line_size branch May 17, 2022 12:17
keithc-ca added a commit to keithc-ca/omr that referenced this pull request Jun 23, 2022
This allows the compiler to do the widening operation once,
rather than repeatedly in loops, fixing a performance regression
introduced in eclipse#6515.

Signed-off-by: Keith W. Campbell <[email protected]>
keithc-ca added a commit to keithc-ca/omr that referenced this pull request Jun 23, 2022
This allows the compiler to do the widening operation once,
rather than repeatedly in loops, fixing a performance regression
introduced in eclipse#6515.

Signed-off-by: Keith W. Campbell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants