Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using cursor causes OOM #2812

Open
gallyamb opened this issue Feb 20, 2023 · 9 comments
Open

Using cursor causes OOM #2812

gallyamb opened this issue Feb 20, 2023 · 9 comments

Comments

@gallyamb
Copy link
Contributor

gallyamb commented Feb 20, 2023

Сursors in certain circumstances became useless, because hold all the data in the memory, causing OOM:

  • on large number of rows fetched via one cursor (depends on Xmx)
  • on any amount of data fetched via cursor, if that cursor is created and iterated sufficient number of times (e.g. fetch 1000 rows in a loop via different cursors N times (N depends on Xmx))

This happens because MyBatis:

  1. registers all cursors created within session and hold a hard reference to them (so they could not be garbage collected)
  2. cursor itself caches each row's data, when there is any nested result map, via hard reference

MyBatis version

3.5.11

Database vendor and version

PostgreSQL 13

Test case or example project

https://github.com/gallyamb/mybatis-3/blob/cursor-cache-oom/src/test/java/org/apache/ibatis/submitted/cursor_cache_oom/CursorOomTest.java

Steps to reproduce

Use result map with nested result map and fetch data via cursor

Expected result

The memory will be used at constant level

Actual result

The memory used as linear with fetched amount of data, i.e. more rows fetched via cursor - more memory is used

gallyamb added a commit to gallyamb/mybatis-3 that referenced this issue Feb 20, 2023
@gallyamb
Copy link
Contributor Author

FYI test case in PR #2813

@harawata
Copy link
Member

Hello @gallyamb ,

When using cursor involving nested results, you should specify resultOrdered="true" in the <select>.
The test still won't pass, but the DefaultResultSetHandler.nestedResultObjects will hold nested objects of the 'current' item only.
https://mybatis.org/mybatis-3/sqlmap-xml.html#select

Please let me know if the problem persists.

@gallyamb
Copy link
Contributor Author

gallyamb commented Feb 21, 2023

@harawata thank you for a quick reply!

I confirm, that adding resultOrdered="true" result in keeping only current row's data in memory (pushed a commit with that change). I've added resultOrdered="true", but did not add ORDER BY clause in SQL, and everything works like a charm

The Cusros' javadoc says, that resultOrdered="true" have to be used, when I have a collection, but in my test case I haven't. So why each row's data is still persisted in memory? Isn't such behaviour is a bug?

* Cursor contract to handle fetching items lazily using an Iterator. Cursors are a perfect fit to handle millions of
* items queries that would not normally fit in memory. If you use collections in resultMaps then cursor SQL queries
* must be ordered (resultOrdered="true") using the id columns of the resultMap.

@harawata
Copy link
Member

Hi @gallyamb ,

When there is a <collection> and the rows are not properly ordered, the result could be incorrect.
For <association>, the result may be correct regardless of the order, so it's not a 'must'.
That is what the Javadoc comment means, I assume.

hazendaz added a commit that referenced this issue Mar 2, 2023
[Test case] Using cursor causes OOM #2812
@hazendaz
Copy link
Member

hazendaz commented Mar 2, 2023

FWIW - The test case that was there showed possibly some small code coverage improvement, not sure its really OOM there since it of course is working but went ahead and merged that in as it had some references indicating sort of situation that could be problematic, probably from this issue. Figured ok another test, it was well written and built ok.

@gallyamb
Copy link
Contributor Author

gallyamb commented Mar 3, 2023

Hi @harawata!

To be honest, I could not see any difference between "could be incorrect" and "may be correct". In both scenarios it either correct or not

Btw, for association I cannot imagine a situation, when result may be incorrect. Each row contains whole entity (properties of entity itself and (if association exists) properties of association)

Summing up, I think, that for scenario, when entity has only associations, but not collections, constant amount of memory must be used, despite resultOrdered="true" is specified or not

Am I correct?

grallcd added a commit to grallcd/mybatis-3 that referenced this issue Mar 6, 2023
* Update dependency org.javassist:javassist to v3.29.2-GA

* Update dependency org.slf4j:slf4j-api to v2.0.1

* Update dependency org.apache.logging.log4j:log4j-api to v2.19.0

* Update dependency org.apache.logging.log4j:log4j-core to v2.19.0

* Update dependency ch.qos.logback:logback-classic to v1.4.1

* [pom] Update project build output timestamp for reproducible builds

* [ci] Correct spelling mistake in proxy factory in better way to avoid naming conflict

Alternative attempt to mybatis#2692

* [maven-release-plugin] prepare release mybatis-3.5.11

* [maven-release-plugin] prepare for next development iteration

* Update dependency org.slf4j:slf4j-api to v2.0.2

* Update junit5 monorepo to v5.9.1

* Referencing collection parameter by name fails

... when the first argument is a special one (i.e. `RowBounds` or `ResultHandler`).

fixes mybatis#2693

* Update dependency org.slf4j:slf4j-api to v2.0.3

* Update dependency org.testcontainers:junit-jupiter to v1.17.4

* Update dependency org.testcontainers:mysql to v1.17.4

* Update dependency org.testcontainers:postgresql to v1.17.4

* Fix a race condition caused by other threads calling mapper methods while mappedStatements are being constructed

* Update dependency ch.qos.logback:logback-classic to v1.4.2

* Update dependency ch.qos.logback:logback-classic to v1.4.3

* Update dependency org.testcontainers:junit-jupiter to v1.17.5

* Update dependency org.testcontainers:mysql to v1.17.5

* Update dependency org.testcontainers:postgresql to v1.17.5

* Update dependency ch.qos.logback:logback-classic to v1.4.4

* Update dependency org.mockito:mockito-junit-jupiter to v4.8.1

* Update dependency mysql:mysql-connector-java to v8.0.31

* Update dependency org.mockito:mockito-core to v4.8.1

* Enable ability to provide custom configuration to XMLConfigBuilder

Supported via the InputStream/Reader constructors

Default constructors will keep using a new configuration

* A subclassed configuration class can now be passed to allow XMLConfigBuilder
to use a specific implementation when creating the configuration

* Adding mapper could fail under JPMS

Under JPMS, ClassLoader#getResources() throws `FileSystemException` it seems.
It might be better to catch IOException, but it could swallow some exception that should be thrown.

Should fix mybatis#2598

* Update mockito monorepo to v4.9.0

* Update dependency org.testcontainers:junit-jupiter to v1.17.6

* Update dependency org.testcontainers:mysql to v1.17.6

* Update dependency org.testcontainers:postgresql to v1.17.6

* Update dependency org.slf4j:slf4j-api to v2.0.4

* Update dependency ch.qos.logback:logback-classic to v1.4.5

* Minor correction: boolean can never be null

* Added failing tests

Some DBs support INSERT, UPDATE or DELETE statement that returns result set.
PostgreSQL has RETURNING
MS SQL Server has OUTPUT

MyBatis can return results by using @select, @SelectProvider or <select />, however, rollback does not work as expected because SELECT does not mark the session 'dirty'.

* Added 'affectData' attribute to SELECT statements

To indicate the SELECT affects DB data.
e.g. PostgreSQL's RETURNING, MS SQL Server's OUTPUT

* Update dependency ch.qos.reload4j:reload4j to v1.2.23

* Bump postgresql from 42.5.0 to 42.5.1

Bumps [postgresql](https://github.com/pgjdbc/pgjdbc) from 42.5.0 to 42.5.1.
- [Release notes](https://github.com/pgjdbc/pgjdbc/releases)
- [Changelog](https://github.com/pgjdbc/pgjdbc/blob/master/CHANGELOG.md)
- [Commits](pgjdbc/pgjdbc@REL42.5.0...REL42.5.1)

---
updated-dependencies:
- dependency-name: org.postgresql:postgresql
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <[email protected]>

* Update dependency org.slf4j:slf4j-api to v2.0.5

* Documentation

* selectCursor() should respect affectData as well

* Update dependency ch.qos.reload4j:reload4j to v1.2.24

* Missing entry in es and ko doc [ci skip]

* Awful typo [ci skip]

* [pom] Update mysql-connector-j to new GAV relocation

* [ci] Use charset with filewriter

* [pom] Add filter to exclude MANIFEST.MF from uber jar from other jars (using ours)

clears warning and behaviour is the same.

* Update dependency org.slf4j:slf4j-api to v2.0.6

* Update mockito monorepo to v4.10.0

* Avoid false-negative

* Update mockito monorepo to v4.11.0

* Update dependency org.mybatis:mybatis-parent to v37

* Resolve resultType by namespace and id when not provide resultType and resultMap

* [actions] Drop jdk 18

* Minor refactoring, more tests

* Update dependency org.assertj:assertj-core to v3.24.1

* Update junit5 monorepo to v5.9.2

* Update mockito monorepo to v5

* Update dependency org.assertj:assertj-core to v3.24.2

* Update dependency com.mysql:mysql-connector-j to v8.0.32

* Update dependency maven to v3.8.7

* Include JDK 21 (EA at this point) in CI build

* [pom] Add mockito subclass support

some of our tests need the legacy style support that mockito made available.  We should look at those in longer term to get on fact that inline is now the default as per mockito we will likely have other issues.

* Update mockito monorepo to v5.1.0

* Update mockito monorepo to v5.1.1

* Update dependency org.postgresql:postgresql to v42.5.2

* [ci] Type the array lists

* Update dependency org.postgresql:postgresql to v42.5.3

* [ci] Sort imports

* [ci] Run partial formatting

* [ci] Update copyright dates

* [ci] Remove extra /p as not required in the javadoc

* [ci] formatting

* [ci] Update copyright dates

* [ci] formatting

* [ci] Update copyright date

* [ci] formatting

* [ci] Formatting

* [ci] formatting

* [ci] formatting

* [ci] formatting

* [ci] formatting

* [ci] Add note about formatting to readme

* [ci] Try adjusting formatting on readme

* [pom] For pre jdk 16 skip impsort

* [ci] Fix readme for formatter tags

* [mvn] Bump to maven 3.9.0 and change maven.config to 3.9/4.x style

* Search readable property when resolving constructor arg type

Should fix mybatis#2803

* Update test code to clarify the consequence

* Update dependency org.postgresql:postgresql to v42.5.4

* Add Unit Test for PooledDataSource

* Using cursor causes OOM mybatis#2812

* [ci] Minor formatting adjustments

* [ci] Import sort new test class

* [ci] Fix end of line markers at end of file per git standards (ie empty last line)

* [ci] Apply open rewrite partially on java cleanup

* [ci] Add private constructors to classes that cannot be instantiated (all static)

* [test] Remove dumb test - not valid now as class is static and should not have been instantiated

* [test] Remove invalid extension of class for testing

* [ci] Tabs to spaces

* [ci] Add final where appropriate per open rewrite

relaxed some of its suggestions but generally took most.

* [pom] Move surefire configuration to parent and make properties more readable

* [pom] Update surefire property usage to non deprecated values

* [ci] Adjust argument

* [ci] Show active profiles being used

* [actions] Drop usage of ci only tests group as unnecessary / directly set excludedGroups

profile is not needed.  All we are doing is setting a property.

* [actions] Run all tests for ubuntu jdk 11 using arglines

* [pom] Add comment on how to disable excluded groups runs

* [pom] Add another comment around excluded groups

* [ci] Add readme that describes how our testing works

* [ci] Do not use star imports (across test classes)

* [ci] Use diamond operators (test classes)

squash with diamond

* [ci] Add missing seial version uid at 1L (throughout tests)

* [ci] Use mockito argument matchers instead of deprecated logic

* [ci] Use 'L' instead of 'l' for long

* [ci] Remove new of primative as known

* [ci] Remove super as no super to call

* [ci] More direct returns to save on GC

* [ci] Various small items cleaning up assignments

* [ci] Better parsing wrapper classes

* [ci] Remove public from methods in interfaces

* [ci] Formatting / update copyright dates

* [ci] Add missing overrides

* [ci] Various small cleanups

* [ci] Use objects class for hash, equals usage

* [ci] Cleanup if statement breaks / return logic

* [ci] Remove unnecessary super calls

* [ci] Make static classes 'static'

* [ci] Remove unnecessary parentheses and allow quicker exit

* [ci] Add missing serial version uid as 1L

* [ci] Use diamond operator

* [ci] Remove named wrapper of list as type given already

* [ci] Remove unused variable from internal method

* [ci] Smarter list creation

* [ci] Update copyright dates

* [ci] Add back type as not until jdk 9 can this be determined

* Add resultOrdered="true" and change expected cursor's nestedResultObjects size to be less than 3

* Update dependency org.apache.logging.log4j:log4j-api to v2.20.0

* Update dependency org.apache.logging.log4j:log4j-core to v2.20.0

* Use CountDownLatch to coordinate the execution order

* [pom] Add some comments in the pom for clarity

* [pom] Remove compiler test compiler argument parameters as not needed now

* [maven-release-plugin] prepare release mybatis-3.5.12

* [maven-release-plugin] prepare for next development iteration

* [ci] Add release backup to gitignore

* [fix] Change formatter around javadoc to retain L&F and correct javadoc error

* [maven-release-plugin] prepare release mybatis-3.5.12

* [maven-release-plugin] prepare for next development iteration

* fix Chinese translation error for `parameterType`

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Jeremy Landis <[email protected]>
Co-authored-by: Iwao AVE! <[email protected]>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: tianshuang <[email protected]>
Co-authored-by: Willie Scholtz <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: FlyInWind <[email protected]>
Co-authored-by: hogimn <[email protected]>
Co-authored-by: gallyamb <[email protected]>
Co-authored-by: puppylpg <[email protected]>
@harawata
Copy link
Member

harawata commented Mar 8, 2023

Hi @gallyamb ,

I thought about it a long time ago and there is a corner case.
I'll try to explain, but first, you need to understand that resultOrdered="true" is a general performance hint and is not directly related to Cursor.

Let's say, a query returns the following result set.

id name friend_name
1 User1 John
2 User2 Mary
1 User1 Bob

The mapper statement returns List<User>.

List<User> getUsers();

The result maps are exactly the same as your demo.

<resultMap id="User" type="xxx.User">
  <id property="id" column="id"/>
  <result property="name" column="name"/>
  <association property="friend" resultMap="Friend"/>
</resultMap>

<resultMap id="Friend" type="xxx.Friend">
  <result property="name" column="friend_name"/>
</resultMap>

As the User.id is specified as <id>, the method should return two User instances in the List.
The actual result is...

  • Without resultOrdered="true", the returned List contains two User instances.
  • With resultOrdered="true", the returned List contains three User instances.

You can argue that it is an incorrect usage and I do not necessarily disagree, but still, the result is 'correct' when resultOrdered is not enabled and we should not break it.

Note that if the return type of the mapper method is Cursor<User>, it will return three User instances with or without resultOrdered="true".

If you think you can improve the documentation or the Javadoc comment, please send us a PR and we will review.
And please let me know if you need further clarification.

@gallyamb
Copy link
Contributor Author

gallyamb commented Mar 9, 2023

@harawata thanks for a detailed clarification! It's very helpful

In my personal opinion, the library should not make its' usage more complicated for users, that use it in correct way, in a try to not break the usage for users (I think, the minority ones), that use it incorrectly

As you said, cursor returns three instances with or without resultOrdered="true". It's an inconsistency between cursors based and basic collections based method, that, imho, more harmful for the users

And I think, that situation, when multiple rows contains same entity (by id) but with different attribute values (either plain attribute or association) should be treated as undefined behaviour (or better throw an exception)

But also I understand, that breaking changes are not expected by the users in minor releases. So maybe we should plan this change, if you also agree with me, ofc, to be included in some future major release?

What comes for a documentation improvement, first we need to clarify further actions on our issue to understand what to do with docs, I think

@harawata
Copy link
Member

harawata commented Mar 9, 2023

Hi @gallyamb ,

I think the current behavior is fine.

  • The default (resultOrdered="false") works for all users.
  • Users who need the best performance can find resultOrdered="true" in the doc.

This is exactly what a good library should do.

Going back to your original report, I think we can add an explanation about the effect of resultOrdered="true" on association.
Or, we can simply say that resultOrdered should be enabled when using Cursor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants