Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ubuntu 16.04 ros-kinetic crashes when loading the node #9

Closed
Pedrous opened this issue Mar 24, 2020 · 26 comments
Closed

Ubuntu 16.04 ros-kinetic crashes when loading the node #9

Pedrous opened this issue Mar 24, 2020 · 26 comments

Comments

@Pedrous
Copy link

Pedrous commented Mar 24, 2020

Hey,

I am using Ubuntu 16.04 and I installed tensorflow 1.8.0 from source by following command:
bazel build --config=opt --define framework_shared_object=false tensorflow:libtensorflow_cc.so

I build the tensorflow_ros_cpp with this command:
catkin build tensorflow_ros_cpp --cmake-args -DTF_BAZEL_LIBRARY="/home/petrimanninen/tensorflow/bazel-bin/tensorflow/libtensorflow_framework.so" -DTF_BAZEL_SRC_DIR="/home/petrimanninen/tensorflow"

I made a dummy node with following CMakelists.txt:

cmake_minimum_required(VERSION 2.8.3)
project(testpkg)
add_definitions(-std=c++11)
find_package(catkin_simple 0.1.0 REQUIRED)
catkin_simple(ALL_DEPS_REQUIRED)
cs_add_executable(${PROJECT_NAME}_node src/test_node.cpp)

And with the following package.xml:

<?xml version="1.0"?>
<package format="2">
  <name>testpkg</name>
  <version>0.0.0</version>
  <description>The testpkg package</description>

  <buildtool_depend>catkin</buildtool_depend>
  <buildtool_depend>catkin_simple</buildtool_depend>
  <depend>roscpp</depend>
  <depend>tensorflow_ros_cpp</depend>
</export>
</package>

If I add the tensorflow_ros_cpp package as a dependency for my dummy package, ROS crashes immediately with segmentation fault when starting the node (otherwise it runs just fine). The output of GDB backtrace is following:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff4b9852b in google::protobuf::Arena::OnArenaAllocation(std::type_info const*, unsigned long) const ()
   from /home/petrimanninen/segmap_ws/devel/lib/libtensorflow_framework.so
(gdb) bt
#0  0x00007ffff4b9852b in google::protobuf::Arena::OnArenaAllocation(std::type_info const*, unsigned long) const ()
   from /home/petrimanninen/segmap_ws/devel/lib/libtensorflow_framework.so
#1  0x00007ffff4ac9201 in google::protobuf::FileDescriptorProto* google::protobuf::Arena::CreateMessage<google::protobuf::FileDescriptorProto>(google::protobuf::Arena*)
    () from /home/petrimanninen/segmap_ws/devel/lib/libtensorflow_framework.so
#2  0x00007ffff4211038 in google::protobuf::MessageLite::ParseFromArray(void const*, int) () from /usr/lib/x86_64-linux-gnu/libprotobuf.so.9
#3  0x00007ffff425a1b6 in google::protobuf::EncodedDescriptorDatabase::Add(void const*, int) () from /usr/lib/x86_64-linux-gnu/libprotobuf.so.9
#4  0x00007ffff421b9b8 in google::protobuf::DescriptorPool::InternalAddGeneratedFile(void const*, int) () from /usr/lib/x86_64-linux-gnu/libprotobuf.so.9
#5  0x00007ffff424a48c in google::protobuf::protobuf_AddDesc_google_2fprotobuf_2fdescriptor_2eproto() () from /usr/lib/x86_64-linux-gnu/libprotobuf.so.9
#6  0x00007ffff7de76ca in call_init (l=<optimized out>, argc=argc@entry=1, argv=argv@entry=0x7fffffffd5d8, env=env@entry=0x7fffffffd5e8) at dl-init.c:72
#7  0x00007ffff7de77db in call_init (env=0x7fffffffd5e8, argv=0x7fffffffd5d8, argc=1, l=<optimized out>) at dl-init.c:30
#8  _dl_init (main_map=0x7ffff7ffe168, argc=1, argv=0x7fffffffd5d8, env=0x7fffffffd5e8) at dl-init.c:120
#9  0x00007ffff7dd7c6a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#10 0x0000000000000001 in ?? ()
#11 0x00007fffffffd9e3 in ?? ()
#12 0x0000000000000000 in ?? ()

I assume that ROS is supposed to use the Ubuntu 16.04 provided protobuf version but somehow it gets linked to the tensorflow protobuf which then does not work, but I cannot quite figure out what to do with it?

Could someone please help me with this?

@peci1
Copy link
Member

peci1 commented Mar 24, 2020

Hi, can you try building and running https://github.com/tradr-project/tensorflow_ros_test ?

@Pedrous
Copy link
Author

Pedrous commented Mar 25, 2020

Thank you for the fast reply. I tried it and the result is the same, I also tried the other branch kinetic-devel which should deal with some of the linking problems but the result is the same.

@peci1
Copy link
Member

peci1 commented Mar 25, 2020

Can you please open a terminal in which you source this workspace and send me here the output of the following commands?

ldd /home/petrimanninen/segmap_ws/devel/lib/libtensorflow_framework.so
echo $LD_LIBRARY_PATH
echo /home/petrimanninen/segmap_ws/build/tensorflow_ros_cpp/CMakeCache.txt | grep TF_

@peci1
Copy link
Member

peci1 commented Mar 25, 2020

And did you also try it with the pip-installed tensorflow? Does it work with that one? If you're gonna try it, rather delete the whole build directory, I'm not sure how well does the library cope with changing the tensorflow "provider".

@Pedrous
Copy link
Author

Pedrous commented Mar 25, 2020

ldd /home/petrimanninen/segmap_ws/devel/lib/libtensorflow_framework.so

linux-vdso.so.1 =>  (0x00007ffca592d000)
libcublas.so.9.0 => /usr/local/cuda-9.0/lib64/libcublas.so.9.0 (0x00007f177c299000)
libcuda.so.1 => /usr/lib/x86_64-linux-gnu/libcuda.so.1 (0x00007f177b093000)
libcudnn.so.7 => /usr/lib/x86_64-linux-gnu/libcudnn.so.7 (0x00007f1769bfc000)
libcufft.so.9.0 => /usr/local/cuda-9.0/lib64/libcufft.so.9.0 (0x00007f1761b5b000)
libcurand.so.9.0 => /usr/local/cuda-9.0/lib64/libcurand.so.9.0 (0x00007f175dbf7000)
libcudart.so.9.0 => /usr/local/cuda-9.0/lib64/libcudart.so.9.0 (0x00007f175d98a000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f175d786000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f175d47d000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f175d260000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f175cede000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f175ccc8000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f175c8fe000)
/lib64/ld-linux-x86-64.so.2 (0x00007f1780e86000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f175c6f6000)

echo $LD_LIBRARY_PATH
/home/petrimanninen/segmap_ws/devel/lib:/opt/ros/kinetic/lib:/opt/ros/kinetic/lib/x86_64-linux-gnu:/usr/local/cuda-9.0/lib64

I guess the third one should be with cat, right?
cat /home/petrimanninen/segmap_ws/build/tensorflow_ros_cpp/CMakeCache.txt | grep TF_

DISABLE_TF_BAZEL_SEARCH:BOOL=OFF
DISABLE_TF_CATKIN_SEARCH:BOOL=OFF
DISABLE_TF_PIP_SEARCH:BOOL=OFF
FORCE_TF_BAZEL_SEARCH:BOOL=OFF
FORCE_TF_CATKIN_SEARCH:BOOL=OFF
FORCE_TF_PIP_SEARCH:BOOL=OFF
TF_BAZEL_LIBRARY:STRING=/home/petrimanninen/tensorflow/bazel-bin/tensorflow/libtensorflow_framework.so
TF_BAZEL_SRC_DIR:STRING=/home/petrimanninen/tensorflow
TF_BAZEL_USE_SYSTEM_PROTOBUF:BOOL=OFF
TF_PIP_DISABLE_SEARCH_FOR_GPU_VERSION:BOOL=OFF
TF_PIP_EXECUTABLE:STRING=pip2.7
TF_PIP_PATH:STRING=
TF_PYTHON_LIBRARY:STRING=
TF_PYTHON_VERSION:STRING=2.7

Answer to your second message is that I first installed the pip version of Tensorflow 1.8.0 but I was unable to get it working because of the C++11ABI incompabilities. I tried to build the tensorflow_ros_cpp first with all the possible settings mentioned here:
https://github.com/ethz-asl/segmap/wiki/FAQ#q-issues-compiling-tensorflow_ros_cpp

@Pedrous
Copy link
Author

Pedrous commented Mar 25, 2020

Now that I rebuilt it again in release mode, I get the following output for tensorflow_ros_test:

Starting program: /home/petrimanninen/segmap_ws/devel/lib/tensorflow_ros_test
/tensorflow_ros_test_node 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
2020-03-25 19:12:25.216772: E tensorflow/core/common_runtime/session.cc:69] Not found: No session factory registered for the given session options: {target: "" config: gpu_options { allow_growth: true }} Registered factories are {}.
Not found: No session factory registered for the given session options: {target: "" config: gpu_options { allow_growth: true }} Registered factories are {}.
[Inferior 1 (process 29608) exited with code 01]
(gdb) bt
No stack.

@Pedrous
Copy link
Author

Pedrous commented Mar 25, 2020

I found this suggestion and tried it but it didn't solve the problem:
https://github.com/tensorflow/tensorflow/issues/3308#issuecomment-233799915

I added this to CMakelists.txt:
set(CMAKE_SHARED_LINKER_FLAGS "-Wl,--allow-multiple-definition -Wl,--whole-archive")

@peci1
Copy link
Member

peci1 commented Mar 25, 2020

Hi. I'd first like to verify if at least the pip version works on your machine. Because if not, then there is something else wrong. Here are the steps to get it working in a clean workspace (assuming tensorflow 1.8.0 from pip is already installed):

mkdir -p tensorflow_ws/src
cd tensorflow_ws/src
git clone https://github.com/tradr-project/tensorflow_ros_cpp
git clone https://github.com/tradr-project/tensorflow_ros_test
cd tensorflow_ros_test/
git checkout kinetic-devel
cd ..
cd ..
catkin init
catkin config --extend /opt/ros/kinetic/
catkin build --cmake-args -DFORCE_TF_PIP_SEARCH=ON
source devel/setup.bash
rosrun tensorflow_ros_test tensorflow_ros_test_node

I see this output:

2020-03-26 00:37:20.407566: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Tensor<type: float shape: [] values: 6>
6

If you get this working, we can go on with the build. Just to make sure - which version of tensorflow are you building from source? Is it also 1.8.0? As you can see in https://github.com/tradr-project/tensorflow_ros_cpp#ubuntu-1604-64bits-python-276-ros-kinetic, that one is tested and it worked for me.

Moreover, the build flags you found should be added to tensorflow build, not to tensorflow_ros_cpp. This might help.

@Pedrous
Copy link
Author

Pedrous commented Mar 26, 2020

I tried your instructions to build a new workspace and get the correct packages but I am unable the build the tensorflow because it is not found. I have done a virtual environment where I have pip installed tensorflow-gpu 1.8.0 like this (I used python3, because with python2 I get an setuptools error when trying to install tensorflow) :

virtualenv -p python3 ~/segmappyenv
source ~/segmappyenv/bin/activate
pip install --upgrade pip
pip install catkin_pkg empy pyyaml
pip install tensorflow-gpu==1.8.0

Now I thought that maybe I should try to solve it with python2 virtualenv. With python2, when I am trying to install tensorflow, I get:
ERROR: Package 'setuptools' requires a different Python: 2.7.12 not in '>=3.5'
I found the following solution pypa/virtualenv#1493, so so I tried then to create virtualenv without setuptools and then separately install version below setuptools 45 like this:

virtualenv  --no-setuptools ~/segmappyenv
source ~/segmappyenv/bin/activate
pip install --upgrade pip
pip install "setuptools<45"
pip install catkin_pkg empy pyyaml
pip install tensorflow-gpu==1.8.0

Now I was able to Install the python2 version and build tensorflow_ws, when I run:
rosrun tensorflow_ros_test tensorflow_ros_test_node

I get:
2020-03-26 11:57:58.635547: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-03-26 11:57:58.685068: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-26 11:57:58.685360: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: Quadro M2000M major: 5 minor: 0 memoryClockRate(GHz): 1.137
pciBusID: 0000:01:00.0
totalMemory: 3.95GiB freeMemory: 2.08GiB
2020-03-26 11:57:58.685380: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2020-03-26 11:57:59.313967: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-26 11:57:59.313997: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2020-03-26 11:57:59.314005: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2020-03-26 11:57:59.314120: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1780 MB memory) -> physical GPU (device: 0, name: Quadro M2000M, pci bus id: 0000:01:00.0, compute capability: 5.0)
Tensor<type: float shape: [] values: 6>
6

But if I try with tensorflow_ros_tes master branch, I cannot build the package because I get:

CMakeFiles/tensorflow_ros_test_node.dir/src/test.cpp.o: In function `main':
test.cpp:(.text.startup+0xe9): undefined reference to `tensorflow::Status::ToString[abi:cxx11]() const'
test.cpp:(.text.startup+0x26f): undefined reference to `tensorflow::ReadBinaryProto(tensorflow::Env*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, google::protobuf::MessageLite*)'
test.cpp:(.text.startup+0x2ff): undefined reference to `tensorflow::Status::ToString[abi:cxx11]() const'
test.cpp:(.text.startup+0x3ba): undefined reference to `tensorflow::Status::ToString[abi:cxx11]() const'
test.cpp:(.text.startup+0x661): undefined reference to `tensorflow::Status::ToString[abi:cxx11]() const'
test.cpp:(.text.startup+0x6f7): undefined reference to `tensorflow::Tensor::DebugString[abi:cxx11]() const'
collect2: error: ld returned 1 exit status
CMakeFiles/tensorflow_ros_test_node.dir/build.make:146: recipe for target '/home/petrimanninen/segmap_ws/devel/lib/tensorflow_ros_test/tensorflow_ros_test_node' failed
make[2]: *** [/home/petrimanninen/segmap_ws/devel/lib/tensorflow_ros_test/tensorflow_ros_test_node] Error 1
CMakeFiles/Makefile2:994: recipe for target 'CMakeFiles/tensorflow_ros_test_node.dir/all' failed
make[1]: *** [CMakeFiles/tensorflow_ros_test_node.dir/all] Error 2
Makefile:138: recipe for target 'all' failed
make: *** [all] Error 2

@peci1
Copy link
Member

peci1 commented Mar 26, 2020

Good, so the build works for you. The master branch of tensorflow_ros_test is expected to fail on Ubuntu Kinetic.

Asides from that, it seems you have some serious problems with Python setup :) Things like that usually happen to me when I follow the warnings from pip and run sudo pip install -U pip. This is a very dangerous command on Ubuntu. The only pip you should use should be the one from system package python-pip. Also, I'm not sure how well ROS works with virtualenvs - I think this kind of setup isn't officially supported.

So the basic functionality is capable of working on your system. Now we can proceed with diagnosing the bazel build. Did you manage to add there the required gcc flags?

@Pedrous
Copy link
Author

Pedrous commented Mar 26, 2020

No, I actually don't have a clue where to put them :).

@Pedrous
Copy link
Author

Pedrous commented Mar 26, 2020

With the bazel build I am also confused, I first used this one from https://www.tensorflow.org/install/source:
bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

But it seems like that it is for for the pip installation? Then I found this instruction https://github.com/tradr-project/tensorflow_ros_cpp#prerequisites-2:
bazel build --config=opt --define framework_shared_object=false tensorflow:libtensorflow_cc.so

So my last build is with this one. But this one does not have --config=cuda option, I don't know if I need it or not? Should I also try to bazel build again?

@Pedrous
Copy link
Author

Pedrous commented Mar 26, 2020

I guess that this might be correct way to pass the linker options, so I am trying it now:
bazel build --config=opt --linkopt="-Wl,--allow-multiple-definition -Wl,--whole-archive" --define framework_shared_object=false tensorflow:libtensorflow_cc.so

@peci1
Copy link
Member

peci1 commented Mar 26, 2020

Thanks for trying. I know building Tensorflow is a really tedious task.

As for the cuda config, I set it during the configure step: https://www.tensorflow.org/install/source#sample_session (click the plus sign to show the sample config session). When you tell the configure script you want CUDA built in, it should not be needed to add anything to the build command (except for the linker flags, which I'm curious if they'll work or not).

@Pedrous
Copy link
Author

Pedrous commented Mar 26, 2020

No problem, thanks for you for helping :)

I was unable to build tensorflow with the link parameters, error is:

ERROR: /home/petrimanninen/tensorflow/tensorflow/BUILD:502:1: Linking of rule '//tensorflow:libtensorflow_cc.so' failed (Exit 1)
/usr/bin/ld.gold: --allow-multiple-definition -Wl: unknown option
/usr/bin/ld.gold: use the --help option for usage information
collect2: error: ld returned 1 exit status
Target //tensorflow:libtensorflow_cc.so failed to build
Use --verbose_failures to see the command lines of failed build steps.

And with --verbose_failures:

ERROR: /home/petrimanninen/tensorflow/tensorflow/BUILD:502:1: Linking of rule '//tensorflow:libtensorflow_cc.so' failed (Exit 1): gcc failed: error executing command 
  (cd /home/petrimanninen/.cache/bazel/_bazel_petrimanninen/835ffab30f5fd8c0aa4db9abfef9aa17/execroot/org_tensorflow && \
  exec env - \
    LD_LIBRARY_PATH=/home/petrimanninen/segmap_ws/devel/lib:/home/petrimanninen/tensorflow_ws/devel/lib:/opt/ros/kinetic/lib:/opt/ros/kinetic/lib/x86_64-linux-gnu:/usr/local/cuda-9.0/lib64 \
    PATH=/home/petrimanninen/segmap_ws/devel/bin:/opt/ros/kinetic/bin:/usr/local/cuda-9.0/bin:/home/petrimanninen/bin:/home/petrimanninen/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin \
    PWD=/proc/self/cwd \
  /usr/bin/gcc -shared -o bazel-out/k8-opt/bin/tensorflow/libtensorflow_cc.so -z defs -s -Wl,--version-script tensorflow/tf_version_script.lds '-Wl,-rpath,$ORIGIN/' -Wl,-soname,libtensorflow_cc.so -Wl,-z,notext -Wl,-z,notext -Wl,-z,notext -Wl,-z,notext -Wl,-z,notext -Wl,-z,notext -Wl,-z,notext -Wl,-z,notext -Wl,-z,notext -Wl,-z,notext -Wl,-z,notext -Wl,-z,notext -Wl,-z,notext -Wl,-z,notext -Wl,-z,notext -Wl,-z,notext -Wl,-z,notext -Wl,-z,notext -Wl,-z,notext -Wl,-z,notext -Wl,-z,notext -Wl,-z,notext -Wl,-z,notext -Wl,-z,notext -Wl,-z,notext -Wl,-z,notext -Wl,-z,notext -Wl,-z,notext -Wl,-z,notext -pthread '-fuse-ld=gold' -Wl,-no-as-needed -Wl,-z,relro,-z,now -B/usr/bin -B/usr/bin -pass-exit-codes -Wl,--gc-sections '-Wl,--allow-multiple-definition -Wl,--whole-archive' -Wl,@bazel-out/k8-opt/bin/tensorflow/libtensorflow_cc.so-2.params)
/usr/bin/ld.gold: --allow-multiple-definition -Wl: unknown option
/usr/bin/ld.gold: use the --help option for usage information
collect2: error: ld returned 1 exit status
Target //tensorflow:libtensorflow_cc.so failed to build

@peci1
Copy link
Member

peci1 commented Mar 26, 2020

You can try putting the linker flags here: https://github.com/tensorflow/tensorflow/blob/r1.8/tensorflow/BUILD#L516 .

@Pedrous
Copy link
Author

Pedrous commented Mar 26, 2020

Just confirming: Your link is for the branch r1.8 but I am using v1.8.0 but that should be okay, right?

@peci1
Copy link
Member

peci1 commented Mar 26, 2020

I think that should be the same. Anyways, I only sent the link to ease you finding the place in the build file you need to change.

@Pedrous
Copy link
Author

Pedrous commented Mar 26, 2020

I put the linker flags to the BUILD file as suggested in https://github.com/tensorflow/tensorflow/blob/r1.8/tensorflow/BUILD#L516. I still get the same error :(

2020-03-26 19:24:34.633813: E tensorflow/core/common_runtime/session.cc:69] Not found: No session factory registered for the given session options: {target: "" config: gpu_options { allow_growth: true }} Registered factories are {}.
Not found: No session factory registered for the given session options: {target: "" config: gpu_options { allow_growth: true }} Registered factories are {}.

@peci1
Copy link
Member

peci1 commented Mar 26, 2020

Oh, that's bad. You can also try building with framework_shared_object=true, I think I adapted the library to support even this case.

As a last resort, I'd try building the pip package via bazel, as the official tutorials do. Just make sure to add --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=1" to the bazel build command so that you don't face the ABI issues as with the officially distributed binary. If you install this pip package to your virtualenv, you should be able to set up the library using pip as you've already tested.

@Pedrous
Copy link
Author

Pedrous commented Mar 26, 2020

One question, I am not sure that when I add something to the BUILD file, should then use clean or something so that the linker flag options are considered or is it automatic? I was wondering if that's why? I don't know how this bazel works but it seems that in catkin it works that way

@peci1
Copy link
Member

peci1 commented Mar 26, 2020

I've no idea, sorry

@Pedrous
Copy link
Author

Pedrous commented Mar 26, 2020

Since I was not sure what happened when I added the linker flags and rebuilt tensorflow, I made bazel clean and then rebuilt the tensorflow with
bazel build --config=opt --verbose_failures --define framework_shared_object=false tensorflow:libtensorflow_cc.so

When it was built and tried to build tensorflow_cpp_ros, I noticed that tensorflow was not found because libtensorflow_framework.so did no exist. Therefore, I built it with:
catkin build --cmake-args -DTF_BAZEL_LIBRARY="/home/petrimanninen/tensorflow/bazel-bin/tensorflow/libtensorflow_cc.so" -DTF_BAZEL_SRC_DIR="/home/petrimanninen/tensorflow"

Now I can confirm that I was able to build the master branch of tensorflow_ros_test and when i run rosrun tensorflow_ros_test tensorflow_ros_test_node, the output is:

2020-03-26 21:12:36.432625: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
Tensor<type: float shape: [] values: 6>
6

@peci1
Copy link
Member

peci1 commented Mar 26, 2020

Great! So can you sum up what was needed to get it working? Did you have to change the BUILD file?

@Pedrous
Copy link
Author

Pedrous commented Mar 26, 2020

I just reviewed that I am now also able to build and run Segmap. So my setup is Ubuntu 16.04, ROS Kinetic, CUDA 9.0 and cuDNN 7.0. This is the summary about what I needed to do build the Tensorflow 1.8.0 correctly and get it working with tensorflow_ros_cpp package.

I downloaded bazel-0.10.0-installer-linux-x86_64.sh in https://github.com/bazelbuild/bazel/releases/tag/0.10.0, then I installed it by typing ./bazel-0.10.0-installer-linux-x86_64.sh --user

git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow
git checkout v1.8.0

gedit tensorflow/BUILD
and added "-Wl,--allow-multiple-definition" and "-Wl,--whole-archive" in the following definition at line 502 in the build file:

tf_cc_shared_object(
    name = "libtensorflow_cc.so",
    linkopts = select({
        "//tensorflow:darwin": [
            "-Wl,-exported_symbols_list",  # This line must be directly followed by the exported_symbols.lds file
            "$(location //tensorflow:tf_exported_symbols.lds)",
        ],
        "//tensorflow:windows": [],
        "//tensorflow:windows_msvc": [],
        "//conditions:default": [
            "-z defs",
            "-s",
            "-Wl,--allow-multiple-definition",
            "-Wl,--whole-archive",
            "-Wl,--version-script",  #  This line must be directly followed by the version_script.lds file
            "$(location //tensorflow:tf_version_script.lds)",
        ],
    }),

I had built the tensorflow before and the BUILD file changes didn't take effect before I did clean:
bazel clean
and then:
bazel build --config=opt --verbose_failures --define framework_shared_object=false tensorflow:libtensorflow_cc.so

Finally built the tensorflow_ros_cpp in the catkin workspace:
catkin build tensorflow_ros_cpp --cmake-args -DTF_BAZEL_LIBRARY="/home/<username>/tensorflow/bazel-bin/tensorflow/libtensorflow_cc.so" -DTF_BAZEL_SRC_DIR="/home/<username>/tensorflow"

Great thanks for Martin! Stay safe and healthy during the epidemic :)

@Pedrous Pedrous closed this as completed Mar 26, 2020
@peci1
Copy link
Member

peci1 commented Mar 27, 2020

Good to hear you managed to get it working, not everybody is willing to give something such a big effort ;) I'll update the readme with your findings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants