-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Adding Pluggable Device For TensorFlow #262
Conversation
Thank you for the RFC! I think the links to the images are broken? |
@penpornk Thanks for the review! I put image relative path in the md file, it seems can be displayed normally in "view file", but broken in the md diff. Seems that the image is linked to the master branch and indeed it was not there yet. Do you have any suggestions? thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for writing the RFC!
@jzhoulon "View file" works for me. Sorry I didn't try that before asking! Now that I think about it, I don't think there is a good way to make the images display in the diff mode here, since the images are not in this repo yet. If you use absolute paths (URLs), you'll need to change them back to relative paths before the PR is merged. -- Not worth the hassle. |
Thanks for the RFC. We're working on adding a new device for DirectML (#243) so this RFC along with #257 are of great interest to us. I have a few high-level questions as follow:
|
@wchao1115 Great questions!
For the current TensorFlow stack, no. Would linking to CPU-only TensorFlow package work for you (~110.MB for 1.15, ~144MB for 2.2.0)? Would love it if you could share your techniques to get the wheel size down to 65MB as well. Looping in @gunan and @annarev for more details.
Yes. It’s because we happen to have the StreamExecutor C API already in the works. The new TFRT/MLIR TensorFlow stack will have much more flexible support for third-party devices. Since the new stack is fundamentally different from the current stack, we can’t guarantee that any code added to the current stack will work with the new stack right away. So the focus for the current stack is to enable basic device integration, for people that need it now, while minimizing throw-away work. We think StreamExecutor C API (covering a small subset of StreamExecutor routines) and PluggableDevice could be sufficient. (Or we’d like to hear why they might not be enough sooner rather than later. -- Hence the RFCs.) “Stream” is just a keyword meant to represent an ordered sequence of kernels to run. Does DirectML have a kernel queue which would fit this definition? Would DirectML be able to use StreamExecutor C API and PluggableDevice? Is there anything you need that is missing? If so, we can look into adding the missing parts (and making the things you don’t need optional).
Do you mean C APIs for Device and DeviceFactory? Or do you mean different APIs for different devices?
Functionality can be extended by appending members to the structures in StreamExecutorInterface. TensorFlow will check for the existence of members before accessing them safely. Plugins just needs to initialize struct_size using the macros provided so that TensorFlow knows which members are accessible. @yisitu will give some examples later in a separate reply. |
Thanks for the RFC. I have two questions regarding support for dataset prefetch_to_device in case of PluggableDevice. When using prefetch to device dataset iterator is on "accelerator" device and resorts to a remote call to the CPU that does the data preparation. One of the obstacles is that remote call mechanism is guarded by ProcessFunctionLibraryRuntime::GetDeviceContext that, as of now, is a bunch of hardcoded device type strings. Do you envision a way to allow PluggableDevice to support remote calls? Such features would be very welcome. Are these topics in scope of Pluggable Device RFC? |
Thanks for the comment! these are good questions.
Tensorflow seems use lots of hardcoded device string type to do branch, some are device name(front-end name), such as (ProcessFunctionLibraryRuntime::GetDeviceContext), some are device type(back-end name), such as (ProcessMemoryTypes), I think we can extend a new branch path, such as provide an util function (IsPluggableDevice(device_type)), this function maintains a global object that contains every device type/name registered from plugin. @annarev do you have comments here? Thanks
I'm not sure whether Kernel and Op Implementation C API will support this, maybe Google can expose IteratorGetNext as a utility functions like (TF_NewKernelBuilder) or provide an special API to register an existing implementation with new device type. @annarev
|
Hi @wchao1115, @penpornk! Please find more details examples here: https://github.com/tensorflow/community/pull/257/files#diff-8312b2e074fd41855aa8a03d13e92b91 |
There will be a public design review meeting for this RFC. Meeting participants are expected to read the RFC ahead of time as the meeting will focus on the remaining issues/questions. |
@yisitu Thank you! |
Thanks for suggestion. I registered a group consisting of IteratorV2, MakeIterator, DeleteIterator, AnonymousIterator, AnonymousIteratorV2, IteratorGetNext, IteratorGetNextSync, IteratorGetNextAsOptional, IteratorToStringHandle, IteratorFromStringHandleV2 on DEVICE_DEFAULT. This somehow places the part of dataset that's designated for CPU on the device. It's a bit surprising because I haven't changed Dataset* registrations. . In short, seems not to be working out of the box for datasets, but I'll try to dig into that. |
Updates based on recent changes and discussion on RFC PR tensorflow#262
@wchao1115 We have updated the versioning strategy. Changes include:
Could you please help take another look and raise any remaining concerns? Thank you! Please feel free to comment on the file directly on the StreamExecutor C API RFC PR's file changes review page. (Or we can discuss here as usual.) |
I added a proposed implementation here: |
Thanks Anna, this looks good.!. |
Questions we got offline (cc: @kulinseth, @wchao1115, @jzhoulon, @Jianhui-Li). Q1: What are the next steps?
Q2: How will the API be released? Will we have a branch where the API will be released in the experimental folder?
Q3: What is the high-level timeline?
Q4: Will this be in TF 2.4?
@wchao1115: Friendly ping. Have you had a chance to look at the updated versioning strategy yet? :) |
cc: @jzhoulon @Jianhui-Li @wchao1115 @kulinseth I'd like to wrap up this RFC (i.e., merge this PR) to pave the way for the upcoming implementation PRs. Here are the changes since the last open design review meeting:
If anyone has anything else that needs to be discussed/resolved before we can move forward (i.e., fundamental issues), please explicitly say so here by this Thursday, 9/24 at 11:59pm PT. @wchao1115 I believe the versioning strategy is not a blocking issue. We can continue discussing and refining the strategy here throughout the experimental phase.
TensorFlow 2.4 branch cut has been announced. If PluggableDevice implementation makes it into TF by 10/21/20 at 5pm PT, it will be in TF 2.4. Otherwise, it won't. |
@penpornk Sorry for the delay. The revised versioning doc looks fine for now. We can iterate more when we start implementing the V1 plug-in. A note on deprecation by the producer leaving an attribute zero-initialized: I'm not sure this is safe and can be done in a backward compatible way even when zero is defined as an invalid value for the attribute because the consumer may reject the zero value and fail the call instead. If in the later minor version the producer setting the value to zero assumes the attribute will be safely ignored by the older plug-ins, the plug-in may fail. I believe it is only safe to do so when the attribute is also considered optional by the producer. A required attribute must not be left zero initialized in the later version without a major version bump. |
Thanks! for clarifying on these questions. There is no outstanding design items from our side. There will be few iterations and adjustments as we get to the implementation PRs and start using the API.
@jzhoulon, @Jianhui-Li it will be great if you can elaborate on it. We would like to know if you are also targeting TF 2.4. Also it will be great if there are any implementation PRs we can try out to adapt our implementation and provide any feedback. |
@kulinseth We are targeting initial PRs for TF2.4 as a stretch goal and incremental PRs on TF2.5. At experimental stage, the initial version wont' be able to run full model, but useful for us to test various plugin and further improve the interface. |
Thanks for the update. I was also hoping to iron out and test the interface in the experimental stage. It will become more clear after looking at the PR and testing with more plugins. |
@wchao1115 @kulinseth Thank you for the quick confirmation! :) @ematejska @alextp I believe there is no outstanding issue now. Could we move forward? |
Yes, let's merge this.
…On Fri, Sep 25, 2020 at 10:11 AM Penporn Koanantakool < ***@***.***> wrote:
@wchao1115 <https://github.com/wchao1115> @kulinseth
<https://github.com/kulinseth> Thank you for the quick confirmation! :)
@Jianhui-Li <https://github.com/Jianhui-Li> Thank you for the timeline!
@ematejska <https://github.com/ematejska> @alextp
<https://github.com/alextp> I believe there is no outstanding issue now.
Could we move forward?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#262 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAABHRMIGWDKABDV7TMOZX3SHTFLBANCNFSM4OGZH3RA>
.
--
- Alex
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Hi, @penpornk and others
Please let me know if there are other ways and if I am missing something. |
Hi @kulinseth ,
Yes, that's expected installation process. |
Hi all, quick updates:
|
Hi guys, Is there any demo like a pseudo pluggable device to show what need to be done to add a device to tensorflow? Thanks |
Hi @kevint324 , |
Sounds great. Looking forward to it. |
reiterating the question. is there a detailed mock pluggable device example or other document on this? |
Please see here for details tutorial and example for pluggable device development. |
This RFC will be open for comment until Monday, July 20th, 2020.
Pluggable device for TensorFlow
Objective
Implement a pluggable device mechanism which allows to run existing TensorFlow programs on a new device without user changing the code. Users only need to install a plugin in a specified directory, and the mechanism is able to discover and plug in the capabilities offered by the plugin.
This RFC is based on the Modular TensorFlow RFC, which aims to extend the TensorFlow design to plugin capabilities like adding a new device support. The modular device interface is based on StreamExecutor C API RFC.