Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][POC] Initial golang worker #17922

Closed
wants to merge 153 commits into from
Closed

Conversation

chenk008
Copy link
Contributor

@chenk008 chenk008 commented Aug 18, 2021

Why are these changes needed?

This is an initial version for Ray to support the Golang worker.

Recently we've received a potential requirement which is to build a Golang FaaS platform on Ray. I did some PoC work to support Golang and succeed to support some basic function of Ray actor. This may be helpful to potential Golang users, so I filed this PR. Nowadays, some popular AI project are starting to use Golang, like gorgonia, gop, gonum and etc.

This PR creates a simple prototype, only support actor creation and actor's method remote call. In Ray's core code, it's needed to support custom worker entry-point because most Golang programmers are used to compile their code into an executable binary file.

For now, a new subdirectory golang is added. But I think putting Golang code into a seperated repository is also an option so it can use go mod to manage the dependencies. Would you please let me know your guys' thoughts?

gorgonia: https://github.com/gorgonia/gorgonia
gop: https://github.com/goplus/gop
gonum: https://github.com/gonum/gonum

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@chenk008
Copy link
Contributor Author

chenk008 commented Sep 18, 2021

Cgo has a significant performance impact, https://github.com/petermattis/fastcgo can reduce overhead, although it is unsafe.

@ericl ericl removed their assignment Oct 8, 2021
@jon-chuang
Copy link
Contributor

Could the vendored libraries be made to be retrieved by Bazel?

Also, is the intention to have each go worker live in its own process? Or can they share a heap and GC like with Java worker?

@chenk008
Copy link
Contributor Author

Could the vendored libraries be made to be retrieved by Bazel?

Also, is the intention to have each go worker live in its own process? Or can they share a heap and GC like with Java worker?

@jon-chuang Here the vendored libraries are introduced temporarily to help compile. Indeed the bazel can help to manage go dependence.

The go worker process wasn't designed on purpose. I didn't figure out how to manage the relationship between actor and goroutine.

extern void go_worker_execute(GoInt task_type, GoSlice ray_function_info, GoSlice args,
GoSlice return_values);

void go_worker_shutdown();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change this to Shutdown?

return 0;
}

DataBuffer *RayBufferToDataBuffer(const std::shared_ptr<ray::Buffer> buffer) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you ensure the lifetime of the buffer? Who owns the respective buffers and is responsible for destructing it?

@@ -0,0 +1,328 @@
#include "go_worker.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you name this file ".cc" to respect the convention in the rest of the codebase?

}
dataSize := C.ulong(len(b))
// we need copy to avoid go runtime change the address
p := C.malloc(dataSize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you malloc here, do you make sure to call free in the execute task context?


RAY_EXPORT DataValue *go_worker_AllocateDataValue(void *data_ptr, size_t data_size,
void *meta_ptr, size_t meta_size) {
auto dv = new DataValue();
Copy link
Contributor

@jon-chuang jon-chuang Jan 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that C++ will free DataValue created this way when passed back to C++ as a ptr? I believe it will not, right? How should it be managed, then?


std::vector<std::shared_ptr<DataValue>> return_value_list;
for (size_t i = 0; i < return_ids.size(); i++) {
return_value_list.push_back(make_shared<DataValue>());
Copy link
Contributor

@jon-chuang jon-chuang Jan 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of making this make_shared?

Since the return value list will get its pointers replaced by pointers generated from the callback without destructing the original make_shared ptr, this would result in memory leak.

@jon-chuang jon-chuang mentioned this pull request Jan 13, 2022
6 tasks
@bveeramani
Copy link
Member

‼️ ACTION REQUIRED ‼️

We've switched our code formatter from YAPF to Black (see #21311).

To prevent issues with merging your code, here's what you'll need to do:

  1. Install Black
pip install -I black==21.12b0
  1. Format changed files with Black
curl -o format-changed.sh https://gist.githubusercontent.com/bveeramani/42ef0e9e387b755a8a735b084af976f2/raw/7631276790765d555c423b8db2b679fd957b984a/format-changed.sh
chmod +x ./format-changed.sh
./format-changed.sh
rm format-changed.sh
  1. Commit your changes.
git add --all
git commit -m "Format Python code with Black"
  1. Merge master into your branch.
git pull upstream master
  1. Resolve merge conflicts (if necessary).

After running these steps, you'll have the updated format.sh.

@jon-chuang
Copy link
Contributor

Could I ask what exactly is the issue with goroutines and actor? I believe this could be similar to async actors in python... In the context of a non-async actor, I don't believe the runtime will even need to spin up additional goroutines....?

@jon-chuang
Copy link
Contributor

Actually I guess I understand your problem.
Async Go worker <- tasks
[run as goroutines]

Go worker is a single runtime context. As long as we ensure runtime spins up a new goroutine per task executed, and we do not block the main loop, we are probably fine. One should use some benchmarks to validate the performance.

@chenk008 what do you think?

Questions: does go worker block on calls to cgo/fastcgo?

@stale
Copy link

stale bot commented Mar 12, 2022

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

  • If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@stale stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Mar 12, 2022
@kfstorm
Copy link
Member

kfstorm commented Mar 13, 2022

‼️ ACTION REQUIRED ‼️

We've updated our formatting configuration for C++ code. (see #22725)

This PR includes C++ code change. To prevent issues with merging your code, here's what you'll need to do:

  1. Merge the latest changes from upstream/master branch into your branch.
git pull upstream master
git merge upstream/master
  1. Resolve merge conflicts (if necessary).

After running these steps, you'll have the updated C++ formatting configuration.

  1. Format changed files.
scripts/format.sh
  1. Commit your changes.
git add --all
git commit -m "Format C++ code"

@stale stale bot removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Mar 13, 2022
@stale
Copy link

stale bot commented Apr 16, 2022

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

  • If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@stale stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Apr 16, 2022
@stale
Copy link

stale bot commented Jul 30, 2022

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!

@stale stale bot closed this Jul 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge Do not merge this PR! stale The issue is stale. It will be closed within 7 days unless there are further conversation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants