MapReduce framework in python coprocessor #3561
Replies: 6 comments
-
I'm interested in this issue, maybe we can break it down into tracking issues and start from the python api and job tracker? @killme2008 |
Beta Was this translation helpful? Give feedback.
-
@e1ijah1 Yep. I'm so sorry I didn't get back to you sooner. If you are interested, you can create a discussion or RFC that we can discuss in detail. I haven't yet carefully thought over the API and operational flow. |
Beta Was this translation helpful? Give feedback.
-
DataFrame API should be more convenient to use than mapreduce. |
Beta Was this translation helpful? Give feedback.
-
Yes, I guess we can just parallelize in |
Beta Was this translation helpful? Give feedback.
-
Perhaps the DataFrame API is a better choice, as it enables optimizations such as columnar storage and vectorized computation, making it more efficient than MapReduce. |
Beta Was this translation helpful? Give feedback.
-
It seems we're working on GreptimeFlow for a standalone mapreduce execution engine. This issue can be still valuable but it requires a more in details design. Move to open-ended discussion and we can come back if the design (draft) is present. |
Beta Was this translation helpful? Give feedback.
-
What problem does the new feature solve?
As we discussed before, if we can support MapReduce processing in python coprocessor, it would be beneficial to process extensive time-series data in parallel and distributed.
For example, we have a
mapper
and areducer
function written in python, try to processing data distributed in multi regions:The user submits the functions, and it will split into multi tasks, every task running the
mapper
function with a region of the table data in parallel and distributed. Thereducer
function collects all data partitions generated by themapper
function ,calculates and returns the final result.What does the feature do?
Described above.
Implementation challenges
Looks like we need the following:
mapper
andreducer
functions.Beta Was this translation helpful? Give feedback.
All reactions