Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spike] Investigate memory use of HttpUrlConnection instrumentation #1750

Closed
meiao opened this issue Feb 20, 2024 · 14 comments
Closed

[Spike] Investigate memory use of HttpUrlConnection instrumentation #1750

meiao opened this issue Feb 20, 2024 · 14 comments
Assignees
Labels
2 Story Point Estimate apr-jun qtr Represents proposed work item for the Apr-Jun quarter GTSE There is an associated support escalation with this issue. spike Research spike. Technical solution needing discovery

Comments

@meiao
Copy link
Contributor

meiao commented Feb 20, 2024

Description

A customer noticed some unusual memory behavior in his application. The behavior was consistent with a memory leak, but there was no OOM error reported. This happens in agents 8.7.0 or newer.

The customer investigated and noticed that most of the memory is in a weak keyed map inside HttpUrlConnection_1016704860_nr_ext. This is a class created during runtime by the agent that holds the fields introduced by the @NewField annotation. He/she also pointed out that this may be related to the changes introduced by PR #1537 . The suspicion is that the instrumentation is not reaching MetricState.reportExternalCall which would null out its fields dtTracer and externalTracer.

@workato-integration
Copy link

@meiao
Copy link
Contributor Author

meiao commented Feb 20, 2024

While trying to reproduce, a simple app that makes requests was created. The app simply creates a transaction and makes an HTTP request to localhost. There is a server running locally that responds to that request.

public class Repro {
    public static void main(String[] args) throws Exception {
        for (int i = 0; i < 20; i++) {
            new Thread(() -> {
                while (true) {
                    try {
                        doCall();
                    } catch (Exception ex) {
                        // keep on
                    }
                }

            }).start();
        }
    }

    @Trace(dispatcher = true)
    public static void doCall() throws Exception {
        URL url = new URL("http://localhost:8080");
        HttpURLConnection connection = (HttpURLConnection) url.openConnection();
        connection.connect();
    }
}

After running this application for a few minutes it was noticed that SlowTransactionService was holding over 100k transactions and the application was experiencing the symptoms of a memory leak.
This may or may be related to the behavior the customer reported, or it could also be something different.

@kford-newrelic kford-newrelic added GTSE There is an associated support escalation with this issue. estimate Issue needing estimation labels Feb 21, 2024
@eNorby1
Copy link

eNorby1 commented Feb 26, 2024

Hi, we have also been experiencing a memory leak - our application's G1 Old Gen heap memory usage constantly kept rising, until eventually running into OOM after a couple weeks. Using MAT we traced the leak suspect to newrelic HttpUrlConnections (see screenshot below). Our application is running on java 21, so we cannot try versions older than 8.7.0, so not sure when the issue got first introduced. For now we had to disable the newrelic agent completely, to prevent the memory issues.

Screenshot 2024-02-26 at 11 04 35

If there's any info I can provide to help get this fixed, let me know.

@meiao
Copy link
Contributor Author

meiao commented Feb 26, 2024

@eNorby1 do you have slow transaction detection on?
If so, can you try disabling it by?

common:
  slow_transactions:
    enabled: false

@eNorby1
Copy link

eNorby1 commented Feb 26, 2024

@eNorby1 do you have slow transaction detection on? If so, can you try disabling it by?

common:
  slow_transactions:
    enabled: false

I'll give it a try and report back

@kford-newrelic kford-newrelic changed the title Investigate memory use of HttpUrlConnection instrumentation [Spike] Investigate memory use of HttpUrlConnection instrumentation Feb 26, 2024
@kford-newrelic kford-newrelic added 2 Story Point Estimate and removed estimate Issue needing estimation labels Feb 26, 2024
@kford-newrelic kford-newrelic added the apr-jun qtr Represents proposed work item for the Apr-Jun quarter label Feb 27, 2024
@eNorby1
Copy link

eNorby1 commented Feb 28, 2024

Hi @meiao problem seems the same with disabled slow transaction detection - would take a couple weeks to run OOM, but you can already see it consistently going up after deploying with the newrelic agent reenabled:
Screenshot 2024-02-28 at 09 08 53
Screenshot 2024-02-28 at 09 11 51

@meiao
Copy link
Contributor Author

meiao commented Feb 28, 2024

@eNorby1 Can you provide an example of how you are using the HttpUrlConnection object?
For example:

var connection = (HttpURLConnection) new URL("http://example.org").openConnection();
int responseCode = connection.getResponseCode();
...

@eNorby1
Copy link

eNorby1 commented Feb 29, 2024

@eNorby1 Can you provide an example of how you are using the HttpUrlConnection object? For example:

var connection = (HttpURLConnection) new URL("http://example.org").openConnection();
int responseCode = connection.getResponseCode();
...

I'm not using it directly, they are via Spring Cloud OpenFeign.

@meiao
Copy link
Contributor Author

meiao commented Feb 29, 2024

That's helpful, we'll try to reproduce using that. Thanks.

@WTThomas1
Copy link

WTThomas1 commented Apr 17, 2024

In case it's helpful, I believe we may be seeing the same issue and/or something related:

image

This started after updating from 8.6 -> 8.10 (then downgrading to 8.9.1 due to discovering the deadlock issue the hard way).

The vast majority of the codebase is HCL DX / IBM Portal based and not utilizing HttpURLConnection directly. However, I do see one direct usage of HttpURLConnection as follows:

final URL url = new URL( apiUrl );

...

final HttpURLConnection conn = (HttpURLConnection) url.openConnection();

conn.setRequestMethod( "POST" );
conn.setRequestProperty( "Content-Type", "application/x-www-form-urlencoded" );
conn.setRequestProperty( "Content-Length", String.valueOf( postDataBytes.length ) );
conn.addRequestProperty( "Authorization", basicAuthPayload );
conn.setDoOutput( true );

conn.getOutputStream().write( postDataBytes );

final Reader in = new BufferedReader( new InputStreamReader( conn.getInputStream(), "UTF-8" ) );

Based on the location in the code, this would not be frequently triggered. Something on the order of 20 / week when the above is ~2 days.

@jasonjkeller jasonjkeller self-assigned this Apr 17, 2024
@jasonjkeller jasonjkeller moved this from In Quarter to In Sprint in Java Engineering Board Apr 17, 2024
@jasonjkeller jasonjkeller removed their assignment Apr 17, 2024
@jasonjkeller jasonjkeller moved this from In Sprint to In Quarter in Java Engineering Board Apr 17, 2024
@jbedell-newrelic jbedell-newrelic self-assigned this May 7, 2024
@jbedell-newrelic jbedell-newrelic moved this from In Quarter to In Sprint in Java Engineering Board May 9, 2024
@jbedell-newrelic
Copy link
Contributor

@eNorby1 and @WTThomas1 We believe we may have identified the source of the leak. Would you be able to take this custom JAR (from this build) and see if that resolves your issue? I would advise doing this in a lower environment, since it's a 1-off build of the agent.

@jbedell-newrelic jbedell-newrelic moved this from In Sprint to Needs Review in Java Engineering Board May 20, 2024
@WTThomas1
Copy link

Good Afternoon,

I do apologize. We don't really have enough traffic in lower environments to be able to effectively test this issue. For multiple reasons, it really isn't possible to test in production at this point. We're frozen at 8.6 until this issue is resolved (and the associated release has been available / in use without issues for a while).

Given our other projects and deadlines, I don't know that I can really be much help at this point (at least until late June). If testing is still needed once we get to a point where I do have some time, I can see if I can put something together (and I'll update if that is the case).

WTT

@eNorby1
Copy link

eNorby1 commented May 21, 2024

Hi @jbedell-newrelic, pretty much the same for me too -- we could never generate enough traffic on our staging environments to reproduce the issue, it needed to run for a few days on production for the memory leak to become apparent.

@jbedell-newrelic
Copy link
Contributor

Okay, I've merged my changes in and they should be in the next release. If that doesn't resolve it for you, let us know.

@github-project-automation github-project-automation bot moved this from Needs Review to Code Complete/Done in Java Engineering Board May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 Story Point Estimate apr-jun qtr Represents proposed work item for the Apr-Jun quarter GTSE There is an associated support escalation with this issue. spike Research spike. Technical solution needing discovery
Projects
Archived in project
Development

No branches or pull requests

6 participants