Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Floating point exception (core dumped) #12105

Open
uniquezhengjie opened this issue Aug 9, 2018 · 13 comments
Open

Floating point exception (core dumped) #12105

uniquezhengjie opened this issue Aug 9, 2018 · 13 comments

Comments

@uniquezhengjie
Copy link

i used command "pip install mxnet-mkl" ,when i import mxnet get Floating point exception (core dumped)

import mxnet
Floating point exception (core dumped)

i test mxnet-mkl (1.2.1.post1) mxnet-mkl (1.2.0) get err Floating point exception (core dumped)

mxnet-mkl 1.0.0 import normally

what problem,i run with docker but get same err

@pengzhao-intel
Copy link
Contributor

We didn't see this issue previously. I think it caused by the environmental issues.

Could you 'pip uninstall' all the current mxnet and re-install?

If it's possible, please send out the all log for our debugging.

@uniquezhengjie
Copy link
Author

i try to 'pip uninstall' all the current mxnet and re-install" ,get same result,in fact,i run my docker image in two compute,one of them is run ok,but another one get those err.

i don't know how to get other debug info to you, it just come out single message "Floating point exception (core dumped)"

@uniquezhengjie
Copy link
Author

i run version with 1.0.0, when i get output layer "net_out = self.model.get_outputs()", final come out same error,what should i do

  data = nd.array(im_tensor)
  print(data.shape)
  #db = mx.io.DataBatch(data=(data,), provide_data=[('data', data.shape)])
  db = mx.io.DataBatch(data=(data,))
  self.model.forward(db, is_train=False)
  net_out = self.model.get_outputs()

@lanking520
Copy link
Member

Hi @uniquezhengjie could you please provide more context on the system you are running on? Which platform, python version, etc. @szha , could you please take a look in here?

@mxnet-label-bot please add [installation, python] here.

@uniquezhengjie
Copy link
Author

uniquezhengjie commented Aug 10, 2018

@lanking520 this is my system info, i test Python 3.6.4 and Python 2.7.12 on ubuntu 16.04 and 14.04.
root@5360f6cad7ec:/# cat /proc/cpuinfo | grep name | cut -f2 -d: | uniq -c
8 Intel(R) Xeon(R) CPU E5-26xx v4
root@5360f6cad7ec:/# getconf LONG_BIT
64
root@5360f6cad7ec:/# uname -a
Linux 5360f6cad7ec 4.4.0-91-generic #114-Ubuntu SMP Tue Aug 8 11:56:56 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

i found that when i run pip install mxnet (not mxnet-mkl)or build form source(https://mxnet.incubator.apache.org/install/index.html?platform=Linux&language=Python&processor=CPU) i can import mxnet normally ,but when i inference my model ,it cost mort time(almost 4~ 5 times than mkl in my normal computers(which can use mxnet-mkl)).
my cpu(Intel(R) Xeon(R) CPU E5-26xx v4) can't support mkl?

@szha
Copy link
Member

szha commented Aug 10, 2018

1.0.0 was directly on mklml and not on mkldnn, so the bug likely have already been addressed with mkldnn integration. @uniquezhengjie is there blocker that prevents you from upgrading to the latest version?

@uniquezhengjie
Copy link
Author

@szha if i run run mxnet-mkl version with 1.0.0, i can import ,but when i get output layer "net_out = self.model.get_outputs()", final come out "Floating point exception".
i can upgrade to latest version (mxnet-mkl 1.2.1.post1) from 1.0.0 with "pip install --upgrade mxnet-mkl", but when i import mxnet get that error .
i don't know how to do next step ...

@szha
Copy link
Member

szha commented Aug 10, 2018

Ah I see. Sorry that I misunderstood your question. Does the same happen when you install mxnet without '-mkl' suffix? Also, could you follow the instructions in the issue template and provide the diagnostic information, so that our friends at intel can look at supported instruction sets? Thanks.

@pengzhao-intel
Copy link
Contributor

i try to 'pip uninstall' all the current mxnet and re-install" ,get same result,in fact,i run my docker image in two compute,one of them is run ok,but another one get those err.

Is the two machine same?
You can try to install the binary w/o docker or from source code (build and run in the same machine).

@uniquezhengjie
Copy link
Author

uniquezhengjie commented Aug 10, 2018

@pengzhao-intel is different, in fact i test 4 kind cpus
machine1(Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz)
machine2 Intel(R) Core(TM) i5-7400 CPU @ 3.00GHz
machine3 Intel(R) Xeon(R) CPU E5-26xx v4
machine4 Intel(R) Xeon(R) CPU E5-26xx v3
machine1 and machine2 is normally ,machine3 and machine4 get error .
i also test run docker image on machine3 and machine4, if run mxnet/python:1.2.1_cpu_mkl get err but run mxnet/python:1.2.1_cpu is OK.
i build from source code with option USE_BLAS=openblas docker image , also run normaly in machine3 .
i don't know how build with option MKL, coz i set USE_BLAS=MKL build error

@pengzhao-intel
Copy link
Contributor

@uniquezhengjie
Copy link
Author

i get error when run "make -j $(nproc) USE_OPENCV=1 USE_MKLDNN=1 USE_BLAS=mkl USE_INTEL_PATH=/opt/intel"

cmake /incubator-mxnet/3rdparty/mkldnn -DCMAKE_INSTALL_PREFIX=/incubator-mxnet/3rdparty/mkldnn/install -B/incubator-mxnet/3rdparty/mkldnn/build -DARCH_OPT_FLAGS="-mtune=generic" -DWITH_TEST=OFF -DWITH_EXAMPLE=OFF
CMake Error at CMakeLists.txt:22 (cmake_policy):
Policy "CMP0054" is not known to this version of CMake.

-- The C compiler identification is GNU 4.9.2
-- The CXX compiler identification is GNU 4.9.2
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- CMAKE_BUILD_TYPE is unset, defaulting to Release
-- Detecting Intel(R) MKL: trying mklml_intel
-- Intel(R) MKL: include /incubator-mxnet/3rdparty/mkldnn/install/include
-- Intel(R) MKL: lib /incubator-mxnet/3rdparty/mkldnn/install/lib/libmklml_intel.so
-- Intel(R) MKL: OpenMP lib /incubator-mxnet/3rdparty/mkldnn/install/lib/libiomp5.so
-- Try OpenMP C flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Try OpenMP CXX flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Found OpenMP: -fopenmp
-- Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE)
-- VTune profiling environment is unset
-- Configuring incomplete, errors occurred!
See also "/incubator-mxnet/3rdparty/mkldnn/build/CMakeFiles/CMakeOutput.log".
mkldnn.mk:38: recipe for target '/incubator-mxnet/3rdparty/mkldnn/install/lib/libmkldnn.so.0' failed
make: *** [/incubator-mxnet/3rdparty/mkldnn/install/lib/libmkldnn.so.0] Error 1

what's problems

@mirekphd
Copy link

mirekphd commented Oct 8, 2018

For CUDA versions (as opposed to MKL) this issue seems to be solved in recent daily builds, see: #11911

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants