Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Distribute Lookup Table #9211

Closed
15 tasks done
jacquesqiao opened this issue Mar 19, 2018 · 4 comments
Closed
15 tasks done

Support Distribute Lookup Table #9211

jacquesqiao opened this issue Mar 19, 2018 · 4 comments
Assignees

Comments

@jacquesqiao
Copy link
Member

jacquesqiao commented Mar 19, 2018

Project

https://github.com/PaddlePaddle/Paddle/projects/56

Tasks

Problems with the current design

  1. problem: all prefetch input and output vars must share the same variables because there is only one prefetch thread block and prefetch op on pserver, it has to take one input and output. So the splite_ids_op -> prefetch_op-> concat_op set must be executed one by one and cannot be execute parallelly. There are many code in dist transpiler to insert and delete ops
    sulotion: A better solution maybe that we have only one prefetch_op and prefetch_grad_op, it does not depend on Variable but use some internal data structure to communicate with pserver.
@helinwang
Copy link
Contributor

helinwang commented Mar 19, 2018

Sorry I tried to understand more about the distributed lookup table but still have some questions, maybe it's because I don't understand the full picture:

In our ListenAndServeOp, we already have the lookup table functionality: https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/operators/listen_and_serv_op.cc#L119
Furthermore, the ListenAndServeOp is distributed on different parameter servers, supports sparse update and runs the optimization block too.
From my understanding the distributed lookup table is a subset of ListenAndServeOp in terms of functionality, curious what benefit can a new distribute lookup table bring, and what are we planning to do with the ListenAndServeOp?

@Yancey1989
Copy link
Contributor

Hi @helinwang
I will try to explain the background.
For the current design of distributed architecture, each trainer has all parameters of a model, but sometimes the parameter is very large(such as embedding layer has a very very large dict_size ) that could not store in one trainer's memory. So we need an approach to store the parameter and prefetch it before using this parameter. Currently, we have two plans to implement it:

  • Prefetch rows data from the Parameter Servers, it's suitable for Fluid.
  • Store the large parameter in a storage service such as Memcached, it's suitable for Abacus

In our ListenAndServeOp, we already have the lookup table functionality:,

You're right, for the plan with Fluid, we don't need to do much more work with ListenAndServeOp, but Trainer would prefetch the correct rows data before one mini-batch.

@helinwang
Copy link
Contributor

helinwang commented Mar 21, 2018

@Yancey1989 Thank you!

@jacquesqiao
Copy link
Member Author

jacquesqiao commented Mar 22, 2018

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants