-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-17980: [C++] As-of-Join Substrait extension #14485
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little torn on this PR. It seems that it includes #14415 which hasn't merged yet, although that is primarily because it blocked waiting for substrait-io/substrait#342 . At this point I think @bkietz remaining comments were fairly minor and I had addressed them. Also, since things are a bit squashed, it is unclear to me how much of #14415 this includes. Do you know if you were able to squash in all the commits from #14415 ?
Also, this effectively means we are temporarily based on a Substrait PR while we wait for Substrait to catch up.
I'm not a big fan of the fact that we need the using statements for the substrait namespace but if we really want that package name then it seems protobuf gives us little choice.
That being said, this PR works, I don't see anything functionally wrong with it, it is moving in the right direction, and it will make adopting the official Substrait release a smaller task. We should probably proceed by merging in #14415 properly, instead of forcing it through integrated as part of this PR though.
Unfortunately, I can't say whether this PR includes all of #14415. To avoid confusion, I'm fine with marking this PR a draft. I'll hold to see how you and @bkietz suggest we should proceed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left one comment about whether we want to require protobuf headers in order to build libarrow_python with substrait.
Additionally, find_package(Protobuf) seems to be missing from the CMakeLists
I would add the following here https://github.com/rtpsw/arrow/blob/fce4dff7df8577731cccb220457ade5e24a33e60/python/CMakeLists.txt#L232
if(ARROW_PROTOBUF_USE_SHARED)
find_package(Protobuf)
include_directories(${PROTOBUF_INCLUDE_DIRS})
get_property(dirs DIRECTORY PROPERTY INCLUDE_DIRECTORIES)
foreach(dir ${dirs})
message("dir='${dir}'")
endforeach()
get_cmake_property(_variableNames VARIABLES)
foreach (_variableName ${_variableNames})
message("${_variableName}=${${_variableName}}")
endforeach()
endif()
But, there might be a better place for it.
@anjakefala It is not. I would like to hide protobuf from python (I suspect an options_internal.h is doable somehow). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up merging. @westonpace, does it look like what you expected? |
How to deal with the following CI job's error?
|
@westonpace, note that this PR is currently using a |
|
@rtpsw Looks like the merge didn't go through cleanly. Let me know if you want me to clean this up. Have you been able to address the concern here: #14485 (review) ? |
Note that this is a result of a GitHub automated rebase. @westonpace, if you can easily clean this up, that would be helpful.
I have removed the dependence on protobuf in the header, but I haven't added the proposed cmake snippet. It sounded like the place for this snippet is under discussion. I'm also not sure whether it is required in this PR or can be deferred. |
1c6f6d0
to
29cc649
Compare
I compared the codes, mostly manually, and couldn't find anything missing. |
The tests were failing after a rebase due to a recent change to CheckRoundTrip (doesn't need a schema). I pushed a small fix. Unfortunately, the test is now failing for me because the process thread marks the plan complete and then tries to join itself. I copied over the change we made in b9cda43 to address this but that change relies on an executor always being present which won't be true until ARROW-15732 merges. |
@westonpace, I see you approved - is there anything holding up this PR? |
@rtpsw I believe it's the same failure you noticed before, which is that there is a segfault because schedule task is synchronous. In the other PR you fixed it by requiring an executor but that introduces ordering problems if I remember correctly. Although it may not matter for a test and will allow us to move forward on this. I will try this real quick. |
If indeed it introduces these ordering problems, there are alternatives fixes for the segfault, e.g., use a standalone thread for that task or use a new |
fd8fedb
to
80afd92
Compare
Agreed. I think we can address this later though. I've rebased (had to address the query context change) and would like to merge this when it is green. |
…he asof join node is not marked finished from the process thread. Doesn't currently work because executor can be null.
…executor for asof-join
80afd92
to
ddaef9d
Compare
The problem was still happening. I've fixed it properly in #15104 using the same fix that was in the demo branch (ensuring that we always have an executor and the async scheduler is never synchronous). I've rebased this (once more) and the problem should no longer occur. |
CI failure seems surprising but unrelated. I filed #15137 to follow up. |
Failure appears unrelated. I will merge. |
@rtpsw thanks for your persistence on this one. Sorry for the rebase troubles at the end. |
Benchmark runs are scheduled for baseline = 9857891 and contender = 85db6f7. 85db6f7 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
['Python', 'R'] benchmarks have high level of regressions. |
@westonpace, thanks for wrapping this up. |
Replacing apache#14385 Lead-authored-by: Yaron Gvili <[email protected]> Co-authored-by: Weston Pace <[email protected]> Signed-off-by: Weston Pace <[email protected]>
Replacing apache#14385 Lead-authored-by: Yaron Gvili <[email protected]> Co-authored-by: Weston Pace <[email protected]> Signed-off-by: Weston Pace <[email protected]>
Replacing #14385