-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP][POC] Initial golang worker #17922
Conversation
Cgo has a significant performance impact, https://github.com/petermattis/fastcgo can reduce overhead, although it is unsafe. |
Could the vendored libraries be made to be retrieved by Bazel? Also, is the intention to have each go worker live in its own process? Or can they share a heap and GC like with Java worker? |
@jon-chuang Here the vendored libraries are introduced temporarily to help compile. Indeed the bazel can help to manage go dependence. The go worker process wasn't designed on purpose. I didn't figure out how to manage the relationship between actor and goroutine. |
extern void go_worker_execute(GoInt task_type, GoSlice ray_function_info, GoSlice args, | ||
GoSlice return_values); | ||
|
||
void go_worker_shutdown(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change this to Shutdown?
return 0; | ||
} | ||
|
||
DataBuffer *RayBufferToDataBuffer(const std::shared_ptr<ray::Buffer> buffer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you ensure the lifetime of the buffer? Who owns the respective buffers and is responsible for destructing it?
@@ -0,0 +1,328 @@ | |||
#include "go_worker.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you name this file ".cc" to respect the convention in the rest of the codebase?
} | ||
dataSize := C.ulong(len(b)) | ||
// we need copy to avoid go runtime change the address | ||
p := C.malloc(dataSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since you malloc here, do you make sure to call free in the execute task context?
|
||
RAY_EXPORT DataValue *go_worker_AllocateDataValue(void *data_ptr, size_t data_size, | ||
void *meta_ptr, size_t meta_size) { | ||
auto dv = new DataValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure that C++ will free DataValue created this way when passed back to C++ as a ptr? I believe it will not, right? How should it be managed, then?
|
||
std::vector<std::shared_ptr<DataValue>> return_value_list; | ||
for (size_t i = 0; i < return_ids.size(); i++) { | ||
return_value_list.push_back(make_shared<DataValue>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of making this make_shared
?
Since the return value list will get its pointers replaced by pointers generated from the callback without destructing the original make_shared
ptr, this would result in memory leak.
|
Could I ask what exactly is the issue with goroutines and actor? I believe this could be similar to async actors in python... In the context of a non-async actor, I don't believe the runtime will even need to spin up additional goroutines....? |
Actually I guess I understand your problem. Go worker is a single runtime context. As long as we ensure runtime spins up a new goroutine per task executed, and we do not block the main loop, we are probably fine. One should use some benchmarks to validate the performance. @chenk008 what do you think? Questions: does go worker block on calls to cgo/fastcgo? |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
|
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
|
Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message. Please feel free to reopen or open a new issue if you'd still like it to be addressed. Again, you can always ask for help on our discussion forum or Ray's public slack channel. Thanks again for opening the issue! |
Why are these changes needed?
This is an initial version for Ray to support the Golang worker.
Recently we've received a potential requirement which is to build a Golang FaaS platform on Ray. I did some PoC work to support Golang and succeed to support some basic function of Ray actor. This may be helpful to potential Golang users, so I filed this PR. Nowadays, some popular AI project are starting to use Golang, like
gorgonia
,gop
,gonum
and etc.This PR creates a simple prototype, only support actor creation and actor's method remote call. In Ray's core code, it's needed to support custom worker entry-point because most Golang programmers are used to compile their code into an executable binary file.
For now, a new subdirectory
golang
is added. But I think putting Golang code into a seperated repository is also an option so it can usego mod
to manage the dependencies. Would you please let me know your guys' thoughts?gorgonia: https://github.com/gorgonia/gorgonia
gop: https://github.com/goplus/gop
gonum: https://github.com/gonum/gonum
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.