-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Distribute Lookup Table #9211
Comments
Sorry I tried to understand more about the distributed lookup table but still have some questions, maybe it's because I don't understand the full picture: In our ListenAndServeOp, we already have the lookup table functionality: https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/operators/listen_and_serv_op.cc#L119 |
Hi @helinwang
You're right, for the plan with Fluid, we don't need to do much more work with |
@Yancey1989 Thank you! |
Project
https://github.com/PaddlePaddle/Paddle/projects/56
Tasks
Add distributed lookup table design(with Abacus) Add distributed lookup table design #9075
detailed design doc for lookup remote table(Fluid) Add design doc for lookup remote table in Fluid #9068
support empty tensor support empty tensor #9338
Operators
prefetch_op
get value frompserver
byids
and output an SelectedRows as parameter forlookup_table_op
. @jacquesqiao use split_ids_op -> prefetch_op -> concat_op to compose a prefetch_op.sum_op
lookup_table_op
, this op should take parameter(SelectedRows) fromprefetch_op
. when use prefetch, we should remove the initialize_op for it's parameter W. Lookup table support selected rows as parameter #9575sgd_op
update the gradient(SelectedRows) to table parameter(SelectedRows) Sgd support update selected rows #9597distribute_table_initialize_op
should initialize a shard of SelectedRows on ParameterServer by shard_id. In the future, it may need to read parameter from a distributed_fils_system. Initialize large table value randomly #9787Sparse Table
Support auto-grown sparse table, support lookup nonexistent key
Transpilers Dist transpiler support prefetch #9714
lookup_table_op
withsplit_ids_op -> prefetch_op -> concat_op
split_ids_op -> send_vars_op
to split table@grad and send them to pserver.table_optimize_block[sum(splited_grad) -> sgd_op]
to pserver_program.Problems with the current design
problem
: all prefetch input and output vars must share the same variables because there is only one prefetch thread block and prefetch op on pserver, it has to take one input and output. So thesplite_ids_op -> prefetch_op-> concat_op
set must be executed one by one and cannot be execute parallelly. There are many code in dist transpiler to insert and delete opssulotion
: A better solution maybe that we have only one prefetch_op and prefetch_grad_op, it does not depend on Variable but use some internal data structure to communicate with pserver.The text was updated successfully, but these errors were encountered: