Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tutorial and sample code for TensorFlow plugin development #352

Merged
merged 30 commits into from
Oct 19, 2022

Conversation

jzhoulon
Copy link
Contributor

@jzhoulon jzhoulon commented Feb 1, 2021

Add tutorial and code sample for TensorFlow plugin development

@theadactyl
Copy link
Contributor

@mihaimaruseac @penpornk can you review?

@penpornk
Copy link
Member

penpornk commented Feb 1, 2021

@theadactyl Yes, I will review it soon.

Copy link
Member

@penpornk penpornk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for this! :)
I think we should make the demo code download necessary third_party files from TensorFlow instead of storing them here.
Also, there is a stray eigen.tar.gz file in //sample/third_party/eigen3. Please remove.

# TensorFlow Plugin demo
This sample is a simple demo shows how to implement, build, install and run a TensorFlow plugin.

##Supported OS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Please add a space after ## or the text won't be rendered as header. Same for other places in this file.

Suggested change
##Supported OS
## Supported OS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* Python (version 3.6 and above)

##Build and Run
(This dependes on ([PluggableDevice] PluggableDevice mechanism implementation)[https://github.com/tensorflow/tensorflow/pull/45784] PR merged)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The link was not formatted correctly.
Also, would we need PR #43611 (device alias) as well?

Suggested change
(This dependes on ([PluggableDevice] PluggableDevice mechanism implementation)[https://github.com/tensorflow/tensorflow/pull/45784] PR merged)
This demo depends on the following ongoing PRs:
* [\[PluggableDevice\] PluggableDevice mechanism implementation](https://github.com/tensorflow/tensorflow/pull/45784).
* [\[PluggableDevice\] Add device alias support](https://github.com/tensorflow/tensorflow/pull/43611)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the device alias PR is only needed when plugin author want to register their device as a "GPU" device(legacy gpu support)


##Prerequisites

* Bazel (version 3.1 and above)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a link for Bazel. :)

Suggested change
* Bazel (version 3.1 and above)
* [Bazel](https://docs.bazel.build/versions/master/install-ubuntu.html) (version 3.1 and above)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

###Linux
1. Run the following commands to install a tf-nightly.
```
>>pip install tf-nightly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Could you please add a space after >>? Same for other places in this file.

Suggested change
>>pip install tf-nightly
>> pip install tf-nightly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

(This dependes on ([PluggableDevice] PluggableDevice mechanism implementation)[https://github.com/tensorflow/tensorflow/pull/45784] PR merged)

###Linux
1. Run the following commands to install a tf-nightly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
1. Run the following commands to install a tf-nightly.
1. Run the following command to install `tf-nightly`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]:
```

3. Then built it with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
3. Then built it with
3. Build the plug-in with

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Raise the error to avoid infinitely looping.
"""
if not question:
question = 'Do you wish to build TensorFlow with %s support?' % query_item
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
question = 'Do you wish to build TensorFlow with %s support?' % query_item
question = 'Do you wish to build TensorFlow plug-in with %s support?' % query_item

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if not question:
question = 'Do you wish to build TensorFlow with %s support?' % query_item
if not yes_reply:
yes_reply = '%s support will be enabled for TensorFlow.' % query_item
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
yes_reply = '%s support will be enabled for TensorFlow.' % query_item
yes_reply = '%s support will be enabled for TensorFlow plug-in.' % query_item

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -0,0 +1,78 @@
# Description:
# Eigen is a C++ template library for linear algebra: vectors,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file and the folder //sample/eigen3 seems to be redundant to the ones in //sample/third_party. Please delete them if they are indeed duplicates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, removed it

@@ -0,0 +1,79 @@
# Description:
# Eigen is a C++ template library for linear algebra: vectors,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the whole third_party/ folder is quite big, can we just download that folder from TensorFlow and keep only what we need? For example, we could add this to build.sh before the bazel build command.

COMMIT=2eea5090c05a1cb99e631aecefd107597afd227b
wget https://github.com/tensorflow/tensorflow/archive/${COMMIT}.tar.gz -O tensorflow.tar.gz
tar -xvzf tensorflow.tar.gz tensorflow-${COMMIT}/third_party/ 
mv tensorflow-${COMMIT}/third_party/eigen3 third_party/
 ...  # Move other files.
rm -rf tensorflow-${COMMIT} tensorflow.tar.gz

This will make the demo code more compact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The demo only depends on eigen_archive and com_google_absl in third-party currently, the version is the same as proper, do we need to download the whole tensorflow package(seems too big).

@PatriceVignola
Copy link

Hi,

I've been following the tutorial and it's pretty useful, but I have one question: what should be the name of the plugin package? In TensorFlow I see that it's explicitly looking for a python package named tensorflow_plugins, but wouldn't that mean that 2 plugins can't be used simultaneously? Is there a way to register a pluggable device with a different name?

Let me know if this isn't the right place to ask this question.

Thanks!

@penpornk
Copy link
Member

penpornk commented May 26, 2021

Hi @PatriceVignola,

Yes, here is the right place to ask.

TensorFlow doesn't exactly look for a package named tensorflow_plugins, it's just going to look for shared libraries (.so files for linux) in a folder named tensorflow-plugins. That folder can have multiple shared libraries from multiple plugin packages, e.g., apu.so, bpu.so, dpu.so, etc.

I think the current pip-package generation script in this tutorial puts more than just the shared library file into the tensorflow-plugins folder when the package is installed. So it might need some adjustments.

@jzhoulon
Copy link
Contributor Author

@PatriceVignola each package can have its own package name,the only requirement is the pluggable device library should be copied to tensorflow-plugins folder, I will update the tutoral example code. Thanks

@PatriceVignola
Copy link

Thanks! That makes sense.

@jzhoulon
Copy link
Contributor Author

@penpornk we have update the graph optimization C API part, including sample code. Thanks.

@PatriceVignola
Copy link

What would be the best way to get the equivalent of libtensorflow_framework.so in Windows without building TensorFlow from source (if any)? It's not included as part of the python package and there doesn't seem to be Windows nightly builds for the C API library.

Thank you!

Copy link

@xiaoshiyilang111 xiaoshiyilang111 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to use the sample code with
tf 2.5.0 cpu version, but error happened!
I compiled a library named mydevice_tf.so using gcc, and copy it into python/lib/site-packages/tensorflow-2.5.0-py3.6-linux-x86_64.egg/tensorflow/python/tensorflow-plugins.
when I run python relu.py, I got this:
Unkown attribute: 'MY_Devices' in '/MY_DEVICE:0'

@ematejska
Copy link
Contributor

@penpornk Have all your changes been addressed? If so, could you re-review?

@dsseng
Copy link

dsseng commented Jul 22, 2022

I have (partially) ported this sample code to Rust. Which way should I better post it? Could I publish that template in my repository or submit it to get included into this PR?

For now it only includes unsafe port of most plugin registration features except profiling and optimization.

@penpornk
Copy link
Member

@sh7dm Thank you for your contribution! We will try to merge this PR soon. Then you can open a new PR to add new changes. How does this sound? :)

@dsseng
Copy link

dsseng commented Jul 22, 2022

Great! May I push stuff I have into an own repo at least for now (of course linking to this PR as the initial source)?

@dsseng
Copy link

dsseng commented Jul 23, 2022

Let it be here for now.

dsseng added a commit to dsseng/rust-tf-pluggabledevice that referenced this pull request Jul 23, 2022
@dsseng
Copy link

dsseng commented Jul 23, 2022

I'm now working on extending my Rust port with stuff like device-bound kernels (accessing device from kernel).


1. We need to include those C API header files provided by Core TensorFlow.

2. The built plugin library needs to add dependency to `_pywrap_tensorflow_internal.so`, which is built by Core TensorFlow. `_pywrap_tensorflow_internal.so` contains those C API implementations. If you don’t add this dependency, it will report an "undefined symbol" error when loading the plugin library.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working on my Rust port I found that linking to libtensorflow_framework.so.2 works fine as well, but does not add a dependency on Python, which is not necessary and limits the possibilities. I changed those libraries when I found that tests do not build because of linking dependency on Python's C APIs, but with C-native lib everything is fine.

See dsseng/rust-tf-pluggabledevice@d1d812c

@ematejska
Copy link
Contributor

@penpornk Once you approve, we'll get this merged.

Copy link
Member

@penpornk penpornk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jzhoulon I sincerely agologize again for the delay. :(

Would you mind making some more changes to the code? Even though they are tutorial code, we should still add the license message to the source files.

@@ -0,0 +1,279 @@
#include "tensorflow_plugin/src/device/cpu/cpu_device_plugin.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add the usual license message to every C++ source/header files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -0,0 +1,6 @@
"""This is a module for dummy test."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add the usual license message to all python source files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

@penpornk penpornk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for the tutorial and I'm so sorry again for the delay!

@theadactyl theadactyl merged commit 22d927e into tensorflow:master Oct 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants