Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Negative totalCallTime, totalExclusiveTime and very big sumOfSquares received #1891

Closed
KGOH opened this issue May 14, 2024 · 12 comments
Closed
Labels
bug Something isn't working as designed/intended wontfix Issue will not be worked on

Comments

@KGOH
Copy link

KGOH commented May 14, 2024

Description

I receive stats with bad data from a single java app within the same transaction and the whole system metrics as a result. I haven't encountered such behavior on other java apps with the same agent. callCount and min/max call time may be incorrect too though they don't look like such in this example:

callCount:          1742
totalCallTime:      -638.41095
totalExclusiveTime: -638.41095
minCallTime:        0.005053215
maxCallTime:        17.494854
sumOfSquares:       4.207051618499706e25 // (NOTE the e25 here)
callCount:          406
totalCallTime:      -2350984400
totalExclusiveTime: -2350984400
minCallTime:        0.00000238
maxCallTime:        0.011196813
sumOfSquares:       278452658588430.66

Expected Behavior

I expect:

  1. Positive totalCallTime and totalExclusiveTime.
  2. sumOfSquares <= maxCallTime*maxCallTime*callCount

Steps to Reproduce

I have no idea how to reproduce this. It's just a random behavior from a single app .

Your Environment

java agent version 8.11.0, OpenJDK 17.0.8

Additional context

Looks like my problem is related to this issue
#709
And this PR
https://github.com/newrelic/newrelic-java-agent/pull/1685/files

@KGOH KGOH added the bug Something isn't working as designed/intended label May 14, 2024
@workato-integration
Copy link

@kford-newrelic
Copy link
Contributor

Hi @KGOH thanks for letting us know about this issue. A few clarifying questions:

Have you noticed this occurring more than once? If yes, approximately how often does it occur? Any repeatable pattern that you can discern?

@KGOH
Copy link
Author

KGOH commented May 14, 2024

@kford-newrelic this happens constantly, approx. every 15 minutes. But only for a specific transaction on the specific java app. It contains many broken metrics within this transaction, such as accessing db etc. I suspect that the transaction may be asynchronous as I see the lettuce lib in the call stack

@kford-newrelic
Copy link
Contributor

Any chance you can create a repro app from that transaction that you can share with us?

If we can get a repro, we might stand a chance of figuring this out...

@KGOH
Copy link
Author

KGOH commented May 14, 2024

@kford-newrelic unfortunately, I don't know how to reproduce this :(

@jtduffy
Copy link
Contributor

jtduffy commented May 14, 2024

Is there anything unique with regards to the problem transaction? Does it use make external calls or use the Process API in anyway? Does it use any sort of async framework? Would it be possible to get some source code or pseudo-code?

We're obviously looking for any pointers in what the underlying cause is since we've been unable to repro this ourselves.

@KGOH
Copy link
Author

KGOH commented May 15, 2024

@jtduffy here are transaction metrics I managed to extract. I erased sensitive information, but this should still convey the overall idea of technologies used

|                                                                                                   :name | :avg_calls | :avg_time |
|---------------------------------------------------------------------------------------------------------+------------+-----------|
|                                                                       Datastore/operation/Redis/PEXPIRE |      5.214 |   -30.283 |
|                                                                         Datastore/operation/Redis/RPUSH |     29.951 |   -17.632 |
|                                                                          Datastore/operation/Redis/HGET |      6.600 |   -10.324 |
|                                                                                              Java/other |     23.758 |    -8.868 |
|                                                                 Datastore/statement/Postgres/.../select |     19.414 |    -6.124 |
|                                                                       Datastore/operation/Redis/HEXISTS |      1.196 |    -2.473 |
|                                                                           Datastore/operation/Redis/GET |      0.000 |    -2.529 |
|                                                             Datastore/statement/MongoDB/...Event/insert |      0.399 |    -1.712 |
|                                                                         Datastore/operation/Redis/HSCAN |      0.399 |    -1.712 |
|                             Java/io.lettuce.core.cluster.RedisAdvancedClusterAsyncCommandsImpl/dispatch |     18.740 |    46.605 |
|                                        Java/sun.net.www.protocol.http.HttpURLConnection/getResponseCode |     13.184 |    42.346 |
|                                        Java/sun.net.www.protocol.http.HttpURLConnection/getOutputStream |     12.194 |    35.980 |
|                           Java/org.apache.cxf.binding.soap.interceptor.SoapOutInterceptor/handleMessage |     14.187 |    34.856 |
|                                         Java/sun.net.www.protocol.http.HttpURLConnection/getInputStream |     10.981 |    33.421 |
|                                                   Java/com.zaxxer.hikari.HikariDataSource/getConnection |      9.171 |    28.940 |
|                                                                          Datastore/operation/Redis/HSET |     34.804 |    22.405 |
|                                                                           Datastore/operation/Redis/DEL |     33.911 |     9.934 |
|                                                    Java/io.lettuce.core.RedisAsyncCommandsImpl/dispatch |      0.787 |     6.850 |
|                                                                 Datastore/statement/Postgres/.../select |      0.598 |     3.483 |
|                                                                        Datastore/operation/Redis/LRANGE |      2.570 |     2.980 |
|                                    Java/org.apache.cxf.interceptor.StaxOutEndingInterceptor/handleFault |      0.094 |     2.127 |
|                                                             Datastore/statement/MongoDB/...Event/insert |      0.000 |     1.712 |
|                                                          External/.../HttpURLConnection/getResponseCode |     17.495 |     1.428 |
|                                                                          Datastore/operation/Redis/HDEL |      1.001 |     0.883 |
|                                                                        Datastore/operation/Redis/LINDEX |      8.631 |     0.710 |
|    Java/org.apache.cxf.binding.soap.interceptor.SoapOutInterceptor$SoapOutEndingInterceptor/handleFault |      0.024 |     0.364 |
|                                                                 Datastore/statement/Postgres/.../select |     19.419 |     0.253 |
| Filter/org.springframework.security.web.context.request.async.WebAsyncManagerIntegrationFilter/doFilter |      0.000 |     0.000 |
|     Java/org.apache.cxf.interceptor.MessageSenderInterceptor$MessageSenderEndingInterceptor/handleFault |      0.374 |     0.000 |
|           Filter/org.springframework.security.web.authentication.www.BasicAuthenticationFilter/doFilter |      0.000 |     0.000 |
|            Filter/org.springframework.boot.web.servlet.DelegatingFilterProxyRegistrationBean$1/doFilter |      0.000 |     0.000 |
|              Filter/org.springframework.boot.web.servlet.filter.OrderedCharacterEncodingFilter/doFilter |      0.000 |     0.000 |
|                Filter/org.springframework.boot.actuate.metrics.web.servlet.WebMvcMetricsFilter/doFilter |      0.000 |     0.000 |
|                 Java/org.apache.cxf.binding.soap.interceptor.SoapHeaderOutFilterInterceptor/handleFault |      0.299 |     0.000 |
|                  Java/org.apache.cxf.binding.soap.interceptor.SoapPreProtocolOutInterceptor/handleFault |      0.000 |     0.000 |
|                    Filter/org.springframework.boot.web.servlet.filter.OrderedFormContentFilter/doFilter |      0.000 |     0.000 |
|                         Filter/org.springframework.security.web.session.DisableEncodeUrlFilter/doFilter |      0.000 |     0.000 |
|                           Java/sun.net.www.protocol.https.DelegateHttpsURLConnection/getResponseMessage |      0.000 |     0.000 |
|                           Java/org.apache.cxf.jaxws.interceptors.WrapperClassOutInterceptor/handleFault |      0.075 |     0.000 |
|                             Java/org.apache.cxf.binding.soap.interceptor.SoapOutInterceptor/handleFault |      0.002 |     0.016 |
|                              Filter/org.springframework.security.web.header.HeaderWriterFilter/doFilter |      0.000 |     0.000 |
|                               Java/sun.net.www.protocol.https.DelegateHttpsURLConnection/getInputStream |      0.000 |     0.000 |
|                                Java/org.glassfish.jersey.server.ServerRuntime$Responder/processResponse |      0.000 |     0.000 |
|                                 Java/org.apache.cxf.jaxws.interceptors.HolderOutInterceptor/handleFault |      0.005 |     0.000 |
|                                    Java/org.apache.cxf.wsdl.interceptors.BareOutInterceptor/handleFault |      0.006 |     0.068 |
|                                    Java/org.apache.cxf.jaxws.interceptors.SwAOutInterceptor/handleFault |      0.019 |     0.000 |
|                                    Java/org.apache.cxf.interceptor.MessageSenderInterceptor/handleFault |      0.000 |     0.000 |
|                                    Java/org.apache.cxf.interceptor.AttachmentOutInterceptor/handleFault |      0.000 |     0.001 |
|                                     Filter/org.springframework.web.filter.RequestContextFilter/doFilter |      0.000 |     0.000 |
|                                          Java/org.apache.cxf.ws.policy.PolicyOutInterceptor/handleFault |      0.001 |     0.000 |
|                                          Java/org.apache.cxf.interceptor.StaxOutInterceptor/handleFault |      0.001 |     0.008 |
|                                           Servlet/org.glassfish.jersey.servlet.ServletContainer/service |      0.000 |     0.000 |
|                                            Java/javax.servlet.ServletRequestListener/requestInitialized |      0.000 |     0.000 |
|                                             Filter/org.apache.tomcat.websocket.server.WsFilter/doFilter |      0.000 |     0.000 |
|                                              Java/org.glassfish.jersey.server.ApplicationHandler/handle |      0.000 |     0.000 |
|                                               Filter/org.springframework.web.filter.CorsFilter/doFilter |      0.000 |     0.000 |
|                                                         Java/com...LoggingOutputInterceptor/handleFault |      0.000 |     0.000 |
|                                                          External/.../HttpURLConnection/getResponseCode |      0.000 |     0.000 |
|                                                           Filter/com...RequestIdentifierFilter/doFilter |      0.000 |     0.000 |
|                                                             Datastore/statement/MongoDB/...Event/insert |      0.399 |     0.000 |
|                                                                 Datastore/statement/Postgres/.../select |      0.598 |     0.003 |
|                                                                 Datastore/statement/Postgres/.../select |      0.598 |     0.001 |
|                                                                     Datastore/operation/Redis/PEXPIREAT |      0.000 |     0.000 |
|                                                                        External/.../CommonsHttp/execute |      0.000 |     0.000 |
|                                                                        External/.../CommonsHttp/execute |      0.000 |     0.000 |
|                                                                        Datastore/operation/Redis/INCRBY |      0.000 |     0.000 |
|                                                                         Datastore/operation/Redis/SETEX |      0.000 |     0.000 |
|                                                                         Datastore/operation/Redis/HKEYS |      0.000 |     0.000 |
|                                                                           Java/com...RestImpl/search... |      0.000 |     0.000 |
|                                                                            Java/com...Service/search... |      0.000 |     0.000 |
|                                                                            Java/com...RestClient/get... |      0.000 |     0.000 |
|                                                                                 External/...Client/POST |      0.000 |     0.000 |

@KGOH
Copy link
Author

KGOH commented May 15, 2024

For a counter example, here's another transaction, that also uses Redis and lettuce, but never gives negative values:

                                                                       External/.../CommonsHttp/execute 3.66  174.81 ms
                                                                             Java/com...RestImpl/get... 1.00  49.51 ms
                                                                      Datastore/operation/Redis/PEXPIRE 39.87 20.45 ms
                                             Java/org.glassfish.jersey.server.ApplicationHandler/handle 0.80  5.95 ms
                                                                                             Java/other 63.98 3.21 ms
                                                                         Datastore/operation/Redis/HGET 7.00  2.95 ms
                                                                         Datastore/operation/Redis/HSET 1.95  1.02 ms
                                                   Java/io.lettuce.core.RedisAsyncCommandsImpl/dispatch 0.00  28.26 µs
                            Java/io.lettuce.core.cluster.RedisAdvancedClusterAsyncCommandsImpl/dispatch 0.02  21.83 µs
                                            Java/org.apache.http.impl.client.InternalHttpClient/execute 0.00  7.64 µs
                                           Java/javax.servlet.ServletRequestListener/requestInitialized 0.00  5.63 µs
                               Java/org.glassfish.jersey.server.ServerRuntime$Responder/processResponse 0.00  2.46 µs
                                                          Filter/com...RequestIdentifierFilter/doFilter 0.00  1.63 µs
          Filter/org.springframework.security.web.authentication.www.BasicAuthenticationFilter/doFilter 0.00  684.94 ns
                                          Servlet/org.glassfish.jersey.servlet.ServletContainer/service 0.00  440.40 ns
             Filter/org.springframework.boot.web.servlet.filter.OrderedCharacterEncodingFilter/doFilter 0.00  110.96 ns
                                    Filter/org.springframework.web.filter.RequestContextFilter/doFilter 0.00  103.87 ns
                                              Filter/org.springframework.web.filter.CorsFilter/doFilter 0.00  41.10 ns
Filter/org.springframework.security.web.context.request.async.WebAsyncManagerIntegrationFilter/doFilter 0.00  28.14 ns

@KGOH
Copy link
Author

KGOH commented May 15, 2024

@jtduffy Is this info helpful?

@jtduffy
Copy link
Contributor

jtduffy commented May 16, 2024

It's more data, but unfortunately I don't think it gets us closer to a solid repro app. This has been a sporadic issue for years and we've never been able to track it down. It just seems like there's no common denominator across applications when it does pop up.
Thanks again for the report and the information you provided.

@jasonjkeller
Copy link
Contributor

@KGOH This is probably a long shot, but if the same transaction is still regularly generating these negative/large metric values every 15 minutes, can you go into the Metric Explorer and check if there are any metrics prefixed with Supportability/API/LegacyAsync/?

If there are, try adding this legacy_async_api_skip_suspend: true config to your agent config yaml (indented 2 spaces under the pre-existing common stanza):

common: &default_settings
  legacy_async_api_skip_suspend: true

I believe that the agent should detect this yaml change and be able to apply it dynamically without requiring a JVM restart, which usually only takes a few minutes. After 15-30 minutes passes do you still see the negative/large metric values associated with the problematic transaction?

Again, this is a long shot, but based on some of the metrics for the problematic transaction I'm wondering if the async servlet/Jetty continuations APIs are being used under the hood and if this might be related to our legacy async API that traces such activity.

@kford-newrelic kford-newrelic added the wontfix Issue will not be worked on label Jun 28, 2024
@kford-newrelic
Copy link
Contributor

@KGOH apologies but we don't have enough information to investigate this further. If you later identify a pattern or other details that might be helpful for us to reproduce this issue, do let us know.

@kford-newrelic kford-newrelic closed this as not planned Won't fix, can't repro, duplicate, stale Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as designed/intended wontfix Issue will not be worked on
Projects
None yet
Development

No branches or pull requests

4 participants