We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I start up master, pserver and trainer in a Docker container, but the trainer can not get the PServer address from etcd, the error logs as below:
['/work/data/uci_housing_train-*-of-*'] ERRO[0000] Get task failed, sleep 3 seconds and continue, no more available task I0719 08:12:24.602708 824 Util.cpp:166] commandline: I0719 08:12:24.607319 824 GradientMachine.cpp:85] Initing parameters.. I0719 08:12:24.607365 824 GradientMachine.cpp:92] Init parameters done. INFO[0000] Connected to etcd: localhost:2379 I0719 08:12:24.962303 824 NewRemoteParameterUpdater.cpp:68] paddle_begin_init_params start I0719 08:12:24.962774 824 NewRemoteParameterUpdater.cpp:71] old param config: name: "___fc_layer_0__.w0" size: 13 initial_mean: 0 initial_std: 0.27735009811261457 dims: 13 dims: 1 initial_strategy: 0 initial_smart: true para_id: 0 INFO[0000] Get psKey= /ps/0 error, context canceled ERRO[0003] Get task failed, sleep 3 seconds and continue, no more available task ERRO[0006] Get task failed, sleep 3 seconds and continue, no more available task ERRO[0009] Get task failed, sleep 3 seconds and continue, no more available task INFO[0010] Get psKey= /ps/0 error, context canceled
The text was updated successfully, but these errors were encountered:
Maybe @typhoonzero 's PR #2948 Fixes it: https://github.com/PaddlePaddle/Paddle/pull/2948/files#diff-116414f87c9d5525b7a3b4738d3ab49cR72
Sorry, something went wrong.
Cool, I will close this issue.
@Yancey1989 Thanks for closing the issue, let's close the issue when it's fixed in develop branch :p
Yancey1989
typhoonzero
Successfully merging a pull request may close this issue.
I start up master, pserver and trainer in a Docker container, but the trainer can not get the PServer address from etcd, the error logs as below:
The text was updated successfully, but these errors were encountered: