Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tf_sampling_so.so error while training #223

Closed
NiranjanRavi1993 opened this issue Apr 18, 2020 · 8 comments
Closed

tf_sampling_so.so error while training #223

NiranjanRavi1993 opened this issue Apr 18, 2020 · 8 comments

Comments

@NiranjanRavi1993
Copy link

NiranjanRavi1993 commented Apr 18, 2020

Hi,
I followed the steps in the Semantic3d dataset and used a custom dataset to train. I was able to create .h5 and all steps were successful. But when I run,
./train_val_semantic3d.sh -g 0 -x semantic3d_x4_2048_fps :
inside models/seg -> the log file shows the following error:
tf_sampling_so.so: cannot open shared object file: No such file or directory

I checked the existing issues (charlesq34/pointnet2#48) and made changes to Pointcnn/sampling/tf_sampling_compiler.sh but still did not work.

I am using the TensorFlow version = 1.15, python 3.6, conda environment(Used pip command to install tf as mentioned in one of the issues. Still didn't work)
Any help on how to resolve this issue?
Regards
Niranjan

@sayakgis
Copy link

@NiranjanRavi1993 : Please check this issue, i could compile using this step. I am not sure about 3.6 python as author has advised to downgrade to 3.5, and it worked for me.

#182

@NiranjanRavi1993
Copy link
Author

@sayakgis Hi, thank you for your reply. I tried the steps in the link you mentioned. Still, the same issue persists.
Python version - 3.5.6
Tf version - 1.10.1
Cuda version - 9.0
Conda version - 4.8.3

I created new environments and tried above. Still did not work. Is there any wrong in the way I am trying? Or the combination of versions is an issue?

.sh script file:
#/bin/bash
PYTHON=python3
CUDA_PATH=/usr/local/cuda
TF_LIB=$($PYTHON -c 'import tensorflow as tf; print(tf.sysconfig.get_lib())')
#PYTHON_VERSION=$($PYTHON -c 'import sys; print("%d.%d"%(sys.version_info[0], sys.version_info[1]))')
TF_PATH=$TF_LIB/include
$CUDA_PATH/bin/nvcc tf_sampling_g.cu -o tf_sampling_g.cu.o -c -O2 -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC
g++ -std=c++11 tf_sampling.cpp tf_sampling_g.cu.o -o tf_sampling_so.so -shared -fPIC -L$TF_LIB -ltensorflow_framework -I $TF_PATH/external/nsync/public/ -I $TF_PATH -I $CUDA_PATH/include -lcudart -L $CUDA_PATH/lib64/ -O2

Thank you
Regards
Niranjan

@sayakgis
Copy link

Could you please elaborate on what error you are getting?

@NiranjanRavi1993
Copy link
Author

@sayakgis
Under model/seg/pointcnn_seg_semantic_3d_x4_2048_fps.txt, below is what i keep getting:

/home/iot/anaconda3/envs/test/PointCNN/data_utils.py:162: H5pyDeprecationWarning: The default file mode will change to 'r' (read-only) in h5py 3.0. To suppress this warning, pass the mode you need to h5py.File(), or set the global default h5.get_config().default_file_mode, or set the environment variable H5PY_DEFAULT_READONLY=1. Available modes are: 'r', 'r+', 'w', 'w-'/'x', 'a'. See the docs for details.
data = h5py.File(os.path.join(folder, line.strip()))
Traceback (most recent call last):
File "../train_val_seg.py", line 311, in
main()
File "../train_val_seg.py", line 136, in main
net = model.Net(points_augmented, features_augmented, is_training, setting)
File "/home/iot/anaconda3/envs/test/PointCNN/pointcnn_seg.py", line 11, in init
PointCNN.init(self, points, features, is_training, setting)
File "/home/iot/anaconda3/envs/test/PointCNN/pointcnn.py", line 64, in init
from sampling import tf_sampling
File "/home/iot/anaconda3/envs/test/PointCNN/sampling/tf_sampling.py", line 15, in
sampling_module=tf.load_op_library(os.path.join(BASE_DIR, 'tf_sampling_so.so'))
File "/home/iot/anaconda3/envs/test/lib/python3.5/site-packages/tensorflow/python/framework/load_library.py", line 56, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /home/iot/anaconda3/envs/test/PointCNN/sampling/tf_sampling_so.so: cannot open shared object file: No such file or directory

@sayakgis
Copy link

sayakgis commented Apr 20, 2020

did the tf_compile.sh create tf_sampling_so.so? i can upload the so file if u need, it is on cuda-9.2.

@NiranjanRavi1993
Copy link
Author

Hi @sayakgis , I realized my mistake. I was able to generate tf_sampling_so.so and utilize it. Now training is successful to some part with my custom datasets. Thank you for your help.

@sayakgis
Copy link

Thanks for the update, which set-up/environment did work for you? Wanted to ask this for information of broader audience.

@NiranjanRavi1993
Copy link
Author

Yes,
Python 3.5
Cuda - 9.0
GCC - 5.5
Tf - 1.10.1
Conda environment - 4.8.3

This is the setup I had. All I had to do was, run tf_compile.sh and start training my model. It worked perfectly fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants