Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GR-42740] [GR-44248] Experimental support for JFR event streaming. #6146

Merged
merged 81 commits into from
Mar 10, 2023

Conversation

graalvmbot
Copy link
Collaborator

No description provided.

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Mar 7, 2023
@christianhaeubl
Copy link
Member

@roberttoyonaga : this is basically your PR with a bunch of changes on top because I ran into larger issues:

  • Locking now happens on the JfrBufferNode and no longer on the JfrBuffer.
  • Added a retired flag because Java-level JfrBuffers can't be freed right away.
  • Disabled flushing of stack traces because it would break the stack trace data.
  • Fixed and improved the test cases.
  • Fixed various other concurrency-related issues.
  • Fixed various issues related to memory management.
  • Various refactorings, cleanups, and more documentation.

There are still a few known issues:

  • If flushing is disabled, then we leak JfrBufferNodes (those nodes are only cleaned up when JFR recording stops).
  • When flushing the JfrRepository data, it can happen that repositories don't contain any unflushed data. In that case, the JfrChunkWriter writes some unnecessary information (i.e., that the repository was empty).
  • destroyJfr() still needs some improvements.

Feel free to review the changes.

@@ -723,14 +716,14 @@ protected void operate() {

/* No further JFR events are emitted, so free all JFR-related buffers. */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment might not be true anymore. We flush all the local buffers to the global buffers, but the java local buffers do not get freed here only retired.

public void teardown() {
JfrBufferNode node = head;
while (node.isNonNull()) {
assert node.getBuffer().isNull();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this assertion and the Javadoc above always true? When destroyJfr is called and the java buffer node list is being torn down, when are the JfrBuffers freed before reaching here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, this is one of the open topics. I need to think about this a bit more because I don't have a good solution yet.

@roberttoyonaga
Copy link
Collaborator

Hi Christian, thank you for resolving those issues. The new changes seem reasonable to me. I only have a few questions.

  • If flushing is disabled, then we leak JfrBufferNodes (those nodes are only cleaned up when JFR recording stops

But even if there is no flushing, won't the node lists still be traversed during an epoch change, and old nodes cleaned up?

@christianhaeubl
Copy link
Member

But even if there is no flushing, won't the node lists still be traversed during an epoch change, and old nodes cleaned up?

That is right, I forgot to mention that. The problem is that a lot of time can pass before an epoch change happens (i.e., a lot of threads could start and die in that time). I think this is something that we should address at some point but it isn't super important because it most likely won't cause any issues in practice.

@roberttoyonaga
Copy link
Collaborator

a lot of threads could start and die in that time

This is what I was a bit concerned about before, and why I had originally had an eager cleanup scheme where nodes attempt to be removed once their threads die. Do you think we should try to re-implement something similar to that at some time in the future?

currentConstantPoolPosition += deltaNext;
} while (deltaNext != 0);

/* Now that we collected all data, verify and clear it. */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to check with compareFoundAndExpectedIds() after each constant pool? This way we can validate each flush individually to make sure there's no unexpected references at any intermediate point?

Copy link
Member

@christianhaeubl christianhaeubl Mar 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, we can indeed verify the accumulated constant pool data after every checkpoint. I changed this so that the code is now similar to what you had in place before.

@christianhaeubl
Copy link
Member

This is what I was a bit concerned about before, and why I had originally had an eager cleanup scheme where nodes attempt to be removed once their threads die. Do you think we should try to re-implement something similar to that at some time in the future?

Yes, but I think that we can use something simple, like an explicit lock for iterating/removing nodes (instead of relying on the JfrChunkWriter lock).

@graalvmbot graalvmbot merged commit 6238398 into master Mar 10, 2023
@graalvmbot graalvmbot deleted the chaeubl/GR-42740 branch March 10, 2023 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants