Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-26751][SQL] Fix memory leak when statement run in background and throw exception which is not HiveSQLException #23673

Closed

Conversation

caneGuy
Copy link
Contributor

@caneGuy caneGuy commented Jan 28, 2019

What changes were proposed in this pull request?

When we run in background and we get exception which is not HiveSQLException,
we may encounter memory leak since handleToOperation will not removed correctly.
The reason is below:

  1. When calling operation.run() in HiveSessionImpl#executeStatementInternal we throw an exception which is not HiveSQLException
  2. Then the opHandle generated by SparkSQLOperationManager will not be added into opHandleSet of HiveSessionImpl , and operationManager.closeOperation(opHandle) will not be called
  3. When we close the session we will also call operationManager.closeOperation(opHandle),since we did not add this opHandle into the opHandleSet.

For the reasons above,the opHandled will always in SparkSQLOperationManager#handleToOperation,which will cause memory leak.
More details and a case has attached on https://issues.apache.org/jira/browse/SPARK-26751
This patch will always throw HiveSQLException when running in background

How was this patch tested?

Exist UT

…kground and throw exception which is not HiveSQLException
@SparkQA
Copy link

SparkQA commented Jan 28, 2019

Test build #101757 has finished for PR 23673 at commit 7addd01.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@caneGuy caneGuy changed the title [SPARK-26751][ThriftServer] Fix memory leak when statement run in background and throw exception which is not HiveSQLException [SPARK-26751][SQL] Fix memory leak when statement run in background and throw exception which is not HiveSQLException Jan 29, 2019
@srowen
Copy link
Member

srowen commented Jan 30, 2019

I don't know this part well, but that seems reasonable for consistency with how the exception is handled above it, and the reasons you give. The Hive code that executes this does appear to expect a HiveSQLException, certainly in a non-fatal case. I guess that code could be expanded to catch all exceptions and clean up in all cases, but that is code we copied from Hive right? so maybe best to not modify it.

srowen pushed a commit that referenced this pull request Feb 3, 2019
…nd throw exception which is not HiveSQLException

## What changes were proposed in this pull request?
When we run in background and we get exception which is not HiveSQLException,
we may encounter memory leak since handleToOperation will not removed correctly.
The reason is below:
1. When calling operation.run() in HiveSessionImpl#executeStatementInternal we throw an exception which is not HiveSQLException
2. Then the opHandle generated by SparkSQLOperationManager will not be added into opHandleSet of HiveSessionImpl , and operationManager.closeOperation(opHandle) will not be called
3. When we close the session we will also call operationManager.closeOperation(opHandle),since we did not add this opHandle into the opHandleSet.

For the reasons above,the opHandled will always in SparkSQLOperationManager#handleToOperation,which will cause memory leak.
More details and a case has attached on https://issues.apache.org/jira/browse/SPARK-26751
This patch will always throw HiveSQLException when running in background

## How was this patch tested?
Exist UT

Closes #23673 from caneGuy/zhoukang/fix-hivesessionimpl-leak.

Authored-by: zhoukang <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
(cherry picked from commit 255faaf)
Signed-off-by: Sean Owen <[email protected]>
srowen pushed a commit that referenced this pull request Feb 3, 2019
…nd throw exception which is not HiveSQLException

## What changes were proposed in this pull request?
When we run in background and we get exception which is not HiveSQLException,
we may encounter memory leak since handleToOperation will not removed correctly.
The reason is below:
1. When calling operation.run() in HiveSessionImpl#executeStatementInternal we throw an exception which is not HiveSQLException
2. Then the opHandle generated by SparkSQLOperationManager will not be added into opHandleSet of HiveSessionImpl , and operationManager.closeOperation(opHandle) will not be called
3. When we close the session we will also call operationManager.closeOperation(opHandle),since we did not add this opHandle into the opHandleSet.

For the reasons above,the opHandled will always in SparkSQLOperationManager#handleToOperation,which will cause memory leak.
More details and a case has attached on https://issues.apache.org/jira/browse/SPARK-26751
This patch will always throw HiveSQLException when running in background

## How was this patch tested?
Exist UT

Closes #23673 from caneGuy/zhoukang/fix-hivesessionimpl-leak.

Authored-by: zhoukang <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
(cherry picked from commit 255faaf)
Signed-off-by: Sean Owen <[email protected]>
@srowen
Copy link
Member

srowen commented Feb 3, 2019

Merged to master/2.4/2.3

@srowen srowen closed this in 255faaf Feb 3, 2019
@caneGuy
Copy link
Contributor Author

caneGuy commented Feb 13, 2019

Thanks @srowen the code is copied from hive , so i did not modify code there

@caneGuy caneGuy deleted the zhoukang/fix-hivesessionimpl-leak branch February 15, 2019 10:38
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…nd throw exception which is not HiveSQLException

## What changes were proposed in this pull request?
When we run in background and we get exception which is not HiveSQLException,
we may encounter memory leak since handleToOperation will not removed correctly.
The reason is below:
1. When calling operation.run() in HiveSessionImpl#executeStatementInternal we throw an exception which is not HiveSQLException
2. Then the opHandle generated by SparkSQLOperationManager will not be added into opHandleSet of HiveSessionImpl , and operationManager.closeOperation(opHandle) will not be called
3. When we close the session we will also call operationManager.closeOperation(opHandle),since we did not add this opHandle into the opHandleSet.

For the reasons above,the opHandled will always in SparkSQLOperationManager#handleToOperation,which will cause memory leak.
More details and a case has attached on https://issues.apache.org/jira/browse/SPARK-26751
This patch will always throw HiveSQLException when running in background

## How was this patch tested?
Exist UT

Closes apache#23673 from caneGuy/zhoukang/fix-hivesessionimpl-leak.

Authored-by: zhoukang <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Jul 23, 2019
…nd throw exception which is not HiveSQLException

## What changes were proposed in this pull request?
When we run in background and we get exception which is not HiveSQLException,
we may encounter memory leak since handleToOperation will not removed correctly.
The reason is below:
1. When calling operation.run() in HiveSessionImpl#executeStatementInternal we throw an exception which is not HiveSQLException
2. Then the opHandle generated by SparkSQLOperationManager will not be added into opHandleSet of HiveSessionImpl , and operationManager.closeOperation(opHandle) will not be called
3. When we close the session we will also call operationManager.closeOperation(opHandle),since we did not add this opHandle into the opHandleSet.

For the reasons above,the opHandled will always in SparkSQLOperationManager#handleToOperation,which will cause memory leak.
More details and a case has attached on https://issues.apache.org/jira/browse/SPARK-26751
This patch will always throw HiveSQLException when running in background

## How was this patch tested?
Exist UT

Closes apache#23673 from caneGuy/zhoukang/fix-hivesessionimpl-leak.

Authored-by: zhoukang <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
(cherry picked from commit 255faaf)
Signed-off-by: Sean Owen <[email protected]>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Jul 25, 2019
…nd throw exception which is not HiveSQLException

## What changes were proposed in this pull request?
When we run in background and we get exception which is not HiveSQLException,
we may encounter memory leak since handleToOperation will not removed correctly.
The reason is below:
1. When calling operation.run() in HiveSessionImpl#executeStatementInternal we throw an exception which is not HiveSQLException
2. Then the opHandle generated by SparkSQLOperationManager will not be added into opHandleSet of HiveSessionImpl , and operationManager.closeOperation(opHandle) will not be called
3. When we close the session we will also call operationManager.closeOperation(opHandle),since we did not add this opHandle into the opHandleSet.

For the reasons above,the opHandled will always in SparkSQLOperationManager#handleToOperation,which will cause memory leak.
More details and a case has attached on https://issues.apache.org/jira/browse/SPARK-26751
This patch will always throw HiveSQLException when running in background

## How was this patch tested?
Exist UT

Closes apache#23673 from caneGuy/zhoukang/fix-hivesessionimpl-leak.

Authored-by: zhoukang <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
(cherry picked from commit 255faaf)
Signed-off-by: Sean Owen <[email protected]>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Aug 1, 2019
…nd throw exception which is not HiveSQLException

## What changes were proposed in this pull request?
When we run in background and we get exception which is not HiveSQLException,
we may encounter memory leak since handleToOperation will not removed correctly.
The reason is below:
1. When calling operation.run() in HiveSessionImpl#executeStatementInternal we throw an exception which is not HiveSQLException
2. Then the opHandle generated by SparkSQLOperationManager will not be added into opHandleSet of HiveSessionImpl , and operationManager.closeOperation(opHandle) will not be called
3. When we close the session we will also call operationManager.closeOperation(opHandle),since we did not add this opHandle into the opHandleSet.

For the reasons above,the opHandled will always in SparkSQLOperationManager#handleToOperation,which will cause memory leak.
More details and a case has attached on https://issues.apache.org/jira/browse/SPARK-26751
This patch will always throw HiveSQLException when running in background

## How was this patch tested?
Exist UT

Closes apache#23673 from caneGuy/zhoukang/fix-hivesessionimpl-leak.

Authored-by: zhoukang <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
(cherry picked from commit 255faaf)
Signed-off-by: Sean Owen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants