Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leak Suspect in ForwardOnlyResultSet #85

Closed
sgtcortez opened this issue Aug 16, 2022 · 15 comments · Fixed by #88
Closed

Memory Leak Suspect in ForwardOnlyResultSet #85

sgtcortez opened this issue Aug 16, 2022 · 15 comments · Fixed by #88

Comments

@sgtcortez
Copy link

sgtcortez commented Aug 16, 2022

Hi, i am so glad to be using this wonderful reactive driver for Oracle!
Thanks

Application

  • Java Version: 17.0.2 openjdk
  • oracle-r2dbc Version: 1.0.0
  • r2dbc-pool Version: 0.9.1
  • ojdbc Version: ojdbc11:21.3.0.0

Description

I have in production an application which, servers a lot of data, and, have many hits per second.
Basically, it uses cache-aside technique with redis.
So, go redis, key not in redis, go to oracle, do some mappings, store it into redis.

It works like a charm, but, when the application is running on a day without interruption, on our apm, I start to see a memory consumption growth, and the garbage collector can not clean the memory and, the memory average starts to growth, and growth and growth.

I took a heap dump, and, with eclipse memory analyzer I saw that, the heap has a lot of ForwardOnlyResultSet objects, which, it seems that its not closed after the use.

Here, we can see that we have a lot of instances of ForwardOnlyResultSet
image

Since, we are using r2dbc-pool, with a pool of 5 connections,
image

I might be missing something, and forgetting to close the result set. Which, is something that I would to on pure jdbc. But, I havent seen nothing here in the docs about that.

Example of use

I suspect that I am not closing the result set below, I am just closing(returning the connection back to the pool) after each statement

    public Mono<CouponDto> fetchCoupon(final long itemId, final long couponId, final long subsidiaryId) {
        return Flux.usingWhen(
                connectionPool.create(),
                connection -> Mono.from(connection.createStatement(QUERY)
                        .bind(0, subsidiaryId)
                        .bind(1, itemId)
                        .bind(2, couponId)
                        .execute()
                ).flatMapMany(it -> it.map(mapper)),
                Connection::close,
                ((connection, throwable) -> connection.close()),
                Connection::close
        ).next();
    }

    private final BiFunction<Row, RowMetadata, CouponDto> mapper = (row, rowMetadata) ->
            new CouponDto(
                    row.get("id", Long.class),
                    row.get("discountPercentage", Double.class),
                    row.get("listPrice", Double.class),
                    row.get("discountType", Integer.class)
            );

Again, thanks for your help & time!
Best Regards, Matheus Rambo

@Michael-A-McMahon
Copy link
Member

Hi Matheus! Thank you for this wonderfully detailed analysis; It helps me immensely to see it all.

I'm sort of out this week, so a proper investigation might have to wait until next week.

For now, I'll fill you in on how Oracle R2DBC closes the underlying JDBC Statement and ResultSet. As you've noted, the R2DBC SPI does not have an explicit close() method for statements and results. Instead, we have an implicit close() that occurs when your code to "fully consumes" the results.
By results, I mean the Result objects emitted by the Publisher<Result> that Statement.execute() returns. For each Result emitted, your code must consume it by calling a method like updateCount or map (I see your code is calling map).
In the case of Result.map(BiFunction), a Publisher<Row> is returned. To fully consume this result, your code must subscribe to this publisher and receive all Rows from the publisher. After the publisher emits the last row, Oracle R2DBC will close the JDBC Statement and ResultSet.
Keep in mind that if the Publisher<Result> is emitting multiple results, each result must be fully consumed to close the statement. So if you have a PL/SQL call returning multiple cursors (ie: multiple ResultSets via DBMS_SQL.RETURN_RESULT), a Publisher<Result> emits a Result for each cursor, and a subscriber must receive each row of each result.
Final thing, and this is important: A subscriber may cancel the subscription to the Publisher<Result>or the Publisher<Row>. If the subscriber cancels, then Oracle R2DBC needs to handle that just as if the subscriber had received all the rows; Oracle R2DBC should close the JDBC statement if a subscriber cancels their subscription to these publishers.

I suspect that DBMS_SQL.RETURN_RESULT is in play here. In the dump, we see T4CPreparedStatement.openImplicitResultSets. This field is where Oracle JDBC stores ResultSets for RETURN_RESULT cursors.

I mentioned cancellation earlier. Flux.next() is specified to do exactly that:
https://projectreactor.io/docs/core/release/api/reactor/core/publisher/Flux.html#next--
So what I think your code will do is receive the first row of the first result, and then send a cancel signal up to Oracle R2DBC. This should trigger a close of the JDBC statement, but your memory dump suggests it is not. There's good reason to think Oracle R2DBC has a bug here.
I'll want to investigate this more when I can. This code looks suspect:

private Publisher<Void> deallocate(Collection<OracleResultImpl> results) {

@sgtcortez
Copy link
Author

Hi, @Michael-A-McMahon thank you for your response.

We do have some queries which results more than one line, but, if our application works correctly it consumes all rows returned by the driver.

I will try a workaround here, and see if it works. For example, I do not know why I am using Flux.usingWhen(), because, the query will return just one row. So, I might be able to switch to Mono.usingWhen.

@sgtcortez
Copy link
Author

I was looking the code that I have, and, the thing about cancellation on next.

So what I think your code will do is receive the first row of the first result, and then send a cancel signal up to Oracle R2DBC. This should trigger a close of the JDBC statement, but your memory dump suggests it is not. There's good reason to think Oracle R2DBC has a bug here.

As you have commented on this issue, a time ago

For Oracle R2DBC, a Result is not longer valid after the connection is closed.

So, since I have the next operator, downstream, I suspect that when it cancels, the connection was closed and the result set was not closed.

@sgtcortez
Copy link
Author

sgtcortez commented Aug 18, 2022

So, instead of:

    public Mono<CouponDto> fetchCoupon(final long itemId, final long couponId, final long subsidiaryId) {
        return Flux.usingWhen(
                connectionPool.create(),
                connection -> Mono.from(connection.createStatement(QUERY)
                        .bind(0, subsidiaryId)
                        .bind(1, itemId)
                        .bind(2, couponId)
                        .execute()
                ).flatMapMany(it -> it.map(mapper)),
                Connection::close,
                ((connection, throwable) -> connection.close()),
                Connection::close
        ).next();
    }

I can write something like:

    public Mono<CouponDto> fetchCoupon(final long itemId, final long couponId, final long subsidiaryId) {
        return Mono.usingWhen(
                connectionPool.create(),
                connection -> Mono.from(connection.createStatement(QUERY)
                        .bind(0, subsidiaryId)
                        .bind(1, itemId)
                        .bind(2, couponId)
                        .execute()
                ).flatMapMany(it -> it.map(mapper)).next(),
                Connection::close,
                ((connection, throwable) -> connection.close()),
                Connection::close
        );
    }

Does it makes sense for you? For me, it makes. Because the result is no longer available after the connection is closed.
Now, the next() should emit the cancellation signal. And the driver must free the resources.

@sgtcortez
Copy link
Author

I have tried my guess, without success.
Memory leaks still occurs ...

I applied a load test in both forms, in both the leak occurs.
I have put a breakpoint in the method that you have mentioned(dellocate) and it is called.

So, in theory the preparedStatement is closed, and, according to the docs of ResultSet, if the statement is cloed, the resultSet is automatically closed.

A ResultSet object is automatically closed when the Statement object that generated it is closed, re-executed, or used to retrieve the next result from a sequence of multiple results.

So, I might be wrong, but something strange is happening, and I suspect that the problem is not in r2dbc-oracle.

@Douglas-Surber
Copy link

Douglas-Surber commented Aug 18, 2022 via email

@sgtcortez
Copy link
Author

Hi @Douglas-Surber thans for your help.

You are right, I do have a call to a stored procedure.

I have the following statement:

  select Pck_Packs_Virtuais.Fnc_Analisa_Cesta_Json(?)
  as retorno from dual

which, is called from:

    public Flux<PackDto> fetchPackIdentifier(final PackIdentifierDto input) {
        return Flux.usingWhen(
                        connectionPool.create(),
                        connection -> Mono.from(connection.createStatement(PROCEDURE)
                                .bind(0, JsonUtil.toJson(List.of(input)))
                                .execute()
                        ).flatMapMany(it -> it.map(mapper)),
                        Connection::close,
                        ((connection, throwable) -> connection.close()),
                        Connection::close
                ).map(s -> List.of(JsonUtil.fromJson(s, PackDto[].class)))
                .flatMap(Flux::fromIterable);
    }

    private final BiFunction<Row, RowMetadata, String> mapper = (row, rowMetadata) ->
            Optional.ofNullable(row.get(0, String.class)).orElse("");

JsonUtil is just a static class which converts json to java and vice-versa.

It seems to consume all the results ... Something that diverge is that I am not doing the procedural call correctly. As shown in the readme

@Michael-A-McMahon
Copy link
Member

I've had some time to look at the Oracle JDBC code, and I've made a concerning discovery:
That openImplicitResultSets field we see in your dump is never cleared. Combined with statement caching (Oracle R2DBC enables this), that ArrayDeque will continue to grow until the statement is evicted from the cache.

Next steps:

  1. Fix the bug in Oracle JDBC.
  2. Work around the bug in Oracle R2DBC.

Things are quite busy in Oracle land these days, but I'll try to get on this as soon as I can.

Huge THANK YOU to @sgtcortez for finding this issue and bringing it to our attention.

@sgtcortez
Copy link
Author

Thank you @Michael-A-McMahon.

I am so glad to help!

@sgtcortez
Copy link
Author

Hi @Michael-A-McMahon

Working for a week now ...
image

I compiled the main branch and added to my repository.
Waiting for the release on maven ...

Thanks

@Michael-A-McMahon
Copy link
Member

Wow! Great to see this follow up!

We aim for a quarterly release cadence. 1.0.0 dropped in the summer, so we should have a new release soon.

Thanks again for the excellent data you provided on this issue; It really helped us a lot. But more importantly, it will help other Oracle JDBC and R2DBC users who might have hit this bug.

@htejwani
Copy link

Hello, do you see this bug causing "ORA-01000: maximum open cursors exceeded error"? We are getting this error during our load tests.

@Michael-A-McMahon
Copy link
Member

Absent of any bug in Oracle R2DBC, ORA-01000 is a symptom of a Result not being consumed. That is: A Result is emitted downstream, but no method is called to consume it: map, getRowsUpdated, or flatMap.

If you can share some code that recreates the failure, I can try to debug it.

On my side, I wrote something just now to try and catch a bug:

import io.r2dbc.spi.Connection;
import io.r2dbc.spi.ConnectionFactories;
import io.r2dbc.spi.ConnectionFactoryOptions;
import oracle.r2dbc.OracleR2dbcOptions;
import org.reactivestreams.Publisher;
import reactor.core.publisher.Flux;
import reactor.core.publisher.Mono;

import static java.lang.String.format;

public class LoadTest {

  static Publisher<? extends Connection> connect() {
    var options =
      ConnectionFactoryOptions
        .parse("r2dbc:oracle://phoenix")
        .mutate()
        .option(
          OracleR2dbcOptions.TNS_ADMIN, "/Users/michaelmcmahon/.oracle/database")
        .build();
    return ConnectionFactories.get(options).create();
  }

  static Publisher<Void> query(Connection connection, int value) {
    return Mono.from(connection.createStatement(format(
      "SELECT %d FROM sys.dual", value))
      .execute())
      .flatMap(result ->
        Mono.from(result.map(row -> row.get(0))))
      .doOnNext(System.out::println)
      .then();
  }

  public static void main(String[] args) {
    Flux.usingWhen(
      connect(),
      connection ->
        Flux.range(0, 5000)
          .concatMap(i -> query(connection, i)),
      Connection::close)
      .blockLast();
  }
}

But this is code is not failing, so yours must be doing something different.

@htejwani
Copy link

Thanks @Michael-A-McMahon for your prompt response. Yes, all our INSERT/UPDATE queries were leaking cursors as we were not invoking the getRowsUpdated method since we didn't need that info to be processed. Adding this call fixed the cursor leak issue. Thanks.

@Michael-A-McMahon
Copy link
Member

I'm glad you've found the issue.

It is understandable that applications won't always call getRowsUpdated for INSERT/UPDATE. So in the 1.1.0 release, I've relaxed the requirement: For INSERT/UPDATE/DELETE, the cursor is automatically closed, even if the Result is not consumed. This seemed more sensible because we only need to keep the cursor open if we are fetching row data or PL/SQL OUT parameters from that cursor. There was no need to keep the cursor around if the result is just an update count.

I'm mentioning this because, if you can update to 1.1.0, then you won't need to change your code.
(As of writing, we are still waiting for the 1.1.0 jar to be published on Maven Central. I believe it will be there very soon though.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants