Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-4583: [Plasma] Fix some small bugs reported by code scan tool #3656

Closed
wants to merge 12 commits into from

Conversation

guoyuhong
Copy link
Contributor

We used a static code scan tool to scan arrow code. There are several possible bugs:

  1. The return value of PyDict_SetItem is not used.
  2. Currently, EventLoop:: Shutdown should be called explicitly, which is error-prone and causing leak when the user forgets to call it.
  3. There is an unclosed file descriptor in io.cc when path name is too long.

Besides, we also made the following small changes:

  1. When we use Plasma in Yarn and when a node uses too much memory, a SIGTERM signal will be sent to Plasma. Current plasma will exit silently. We also some log to plasma store to help us to debug.
  2. ARROW_LOG will always evaluate the output expression even when it is not enabled, which is not efficient.
  3. The constructor of Java class ObjectStoreData is private, which is not convenient when we want to create a mock plasma store.
  4. Fix a call to ObjectStoreData which misplaces meta and data according to https://github.com/apache/arrow/blob/master/java/plasma/src/main/java/org/apache/arrow/plasma/ObjectStoreLink.java#L32 .

@xhochy
Copy link
Member

xhochy commented Feb 15, 2019

@guoyuhong Out of curiosity which "code scan tool" did you use?

@guoyuhong
Copy link
Contributor Author

@xhochy It is called Coverity.

@wesm
Copy link
Member

wesm commented Feb 15, 2019

Coverity just started offering free scans to open source projects again. @pitrou can you review?

@@ -55,7 +55,12 @@ enum class ArrowLogLevel : int {
};

#define ARROW_LOG_INTERNAL(level) ::arrow::util::ArrowLog(__FILE__, __LINE__, level)
#define ARROW_LOG(level) ARROW_LOG_INTERNAL(::arrow::util::ArrowLogLevel::ARROW_##level)
// clang-format off
#define ARROW_LOG(level) if ( \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are lint tool conflicts here, so I disable clang-format.

@guoyuhong
Copy link
Contributor Author

@pcmoritz Please take a look at this PR.

@guoyuhong
Copy link
Contributor Author

Ping... @pcmoritz

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this @guoyuhong . Here are some comments.

cpp/src/arrow/python/deserialize.cc Outdated Show resolved Hide resolved
cpp/src/plasma/store.cc Outdated Show resolved Hide resolved
cpp/src/plasma/store.cc Outdated Show resolved Hide resolved
cpp/src/arrow/util/logging.h Outdated Show resolved Hide resolved
int ret = PyDict_SetItem(result.obj(), PyList_GET_ITEM(keys.obj(), i - start_idx),
PyList_GET_ITEM(vals.obj(), i - start_idx));
if (ret != 0) {
return ConvertPyError(StatusCode::PythonError);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry for not being specific here. It's better to not override the second parameter, so that the proper status code is returned (for example a Python MemoryError would be turned into a OutOfMemory Status, etc.).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(attempting to fix it directly from Github...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I checked out common.h. Yes, ConvertPyError() is better.

@pitrou
Copy link
Member

pitrou commented Feb 18, 2019

Rebased, will wait for CI.

@pitrou
Copy link
Member

pitrou commented Feb 18, 2019

Hmm, I really don't understand what the merge script is saying me here:

$ ./dev/merge_arrow_pr.py 
ARROW_HOME = /home/antoine/arrow
PROJECT_NAME = arrow
Which pull request would you like to merge? (e.g. 34): 3656

Pull request 3656 is not mergeable in its current form.
Continue? (experts only!) (y/n): n

@wesm did something weird happen on git master?

@wesm
Copy link
Member

wesm commented Feb 19, 2019

@pitrou master had a hiccup about 5 hours ago at 19:30 UTC; I accidentally created a merge commit in master (using the GitHub UI -- I have since asked INFRA to disallow merge commits through the UI) and quickly rebased and force-pushed master. It's possible that this PR was impacted. I have also seen the merge script misbehave at times like this when the GitHub API returns bad results now and then

@kszucs
Copy link
Member

kszucs commented Feb 19, 2019

@wesm I've rebased it since that.

@pitrou pitrou closed this in 135d481 Feb 19, 2019
@pitrou
Copy link
Member

pitrou commented Feb 19, 2019

Finally merged. Thanks @guoyuhong !

@guoyuhong
Copy link
Contributor Author

@pitrou Thanks!

@guoyuhong guoyuhong deleted the fixPlasma branch February 19, 2019 11:30
tanyaschlusser pushed a commit to tanyaschlusser/arrow that referenced this pull request Feb 21, 2019
We used a static code scan tool to scan arrow code. There are several possible bugs:
1. The return value of `PyDict_SetItem` is not used.
2. Currently, `EventLoop:: Shutdown` should be called explicitly, which is error-prone and causing leak when the user forgets to call it.
3. There is an unclosed file descriptor in `io.cc` when path name is too long.

Besides, we also made the following small changes:
1. When we use Plasma in Yarn and when a node uses too much memory, a SIGTERM signal will be sent to Plasma. Current plasma will exit silently. We also some log to plasma store to help us to debug.
2. `ARROW_LOG` will always evaluate the output expression even when it is not enabled, which is not efficient.
3. The constructor of Java class `ObjectStoreData` is private, which is not convenient when we want to create a mock plasma store.
4. Fix a call to `ObjectStoreData` which misplaces `meta` and `data` according to https://github.com/apache/arrow/blob/master/java/plasma/src/main/java/org/apache/arrow/plasma/ObjectStoreLink.java#L32 .

Author: Yuhong Guo <[email protected]>
Author: Antoine Pitrou <[email protected]>

Closes apache#3656 from guoyuhong/fixPlasma and squashes the following commits:

634e36a <Antoine Pitrou> Use default argument value to `ConvertPyError`
b547f2f <Yuhong Guo> remove if from ARROW_LOG
440f097 <Yuhong Guo> Address comment
d3eb22f <Yuhong Guo> Lint
79b4af3 <Yuhong Guo> Fix and Lint
434a039 <Yuhong Guo> Make constructor of ObjectStoreData public
b2ddba6 <Yuhong Guo> Fix misplace of meta and data in PlasmaClient.java
a667402 <Yuhong Guo> Do not evaluate logging strings when logging is not enabled.
5be7b89 <Yuhong Guo> Fix unclosed fd reported by code scan tool
3a917d4 <Yuhong Guo> Fix not used return value in deserialize.cc reported by code scan tool
ed56a48 <Yuhong Guo> Fix possible unclosed EventLoop reported by code scanning tool
3fed926 <Yuhong Guo> Add plasma log
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants