Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper Caffe opencl branch installation Instructions for Intel GPU #5099

Open
atlury opened this issue Dec 16, 2016 · 87 comments
Open

Proper Caffe opencl branch installation Instructions for Intel GPU #5099

atlury opened this issue Dec 16, 2016 · 87 comments

Comments

@atlury
Copy link

atlury commented Dec 16, 2016

I am sorry that I have to open this but both in the opencl github branch and the google forums dont have any kind (updated) step by step installation instructions for installing Caffe Opencl on Intel GPU with Intel Opencl drivers especially for someone new.

(a) Do these instructions still work?
cmake -DUSE_GREENTEA=ON -DUSE_INTEL_SPATIAL=ON -DUSE_ISAAC=ON path_to_caffe_source
make -jn
make -jn runtest

on this branch https://github.com/BVLC/caffe/tree/opencl? or

What about?
cmake -DUSE_GREENTEA=ON -DUSE_INTEL_SPATIAL=ON -DUSE_ISAAC=ON -DBUILD_SHARED_LIBS=OFF -DUSE_CUDNN=OFF -DUSE -DBUILD_docs=OFF -DBUILD_python=OFF -DBUILD_matlab=OFF /root/caffe-opencl

(b) Is atlaspack still needed for compiling opencl-caffe when clblas is there??? It keeps asking for atlaspack???

(c) what about Vienna CL? Does that branch still depend on them? Is it needed?

(D) What is libdnn for? in place of ?

(e) What about ISAAC?

(f) The windows branch for example talks "If CUDA is not installed Caffe will default to a CPU_ONLY build" Does this mean it will not work in Opencl Mode in non-cuda builds??

Kindly update and provide step-by-step instructions
Thank you

@naibaf7
Copy link
Member

naibaf7 commented Dec 16, 2016

@atlury
There is a Windows section in the Readme that guides how to compile and install on Windows.
The only step missing in that description is downloading ViennaCL-DEV:
https://github.com/viennacl/viennacl-dev

It can be put in any one of the paths where CMake will find it, such as next to the folder into which you cloned Caffe.

The build instructions are different from the Linux instructions, since it is a script that automatically takes care of CMake configuration and downloading dependencies.

Usually there's no huge need to worry about configuration on Windows, since it's designed to just work. However I will give you a quick explanation:
(a) No and no. Use scripts/build_win.cmd as described in the Readme.
(b) Yes no matter how you compile it, a CPU BLAS is always needed. But build_win.cmd will take care of that for you, and it's default configuration is to use OpenBLAS.
(c) Yes, ViennaCL is needed, clone from here: https://github.com/viennacl/viennacl-dev
(d) LibDNN is the convolution engine default for OpenCL GPUs, replacement for cuDNN.
There's also additional Intel kernels for Intel GPUs available and enabled by default.
(e) ISAAC, clBLAS and CLBlast are strictly optional. You need to compile these separately on Windows and add them to the dependencies if you want to use them. I do not guarantee or support the compilation of any of these libraries, they are supported by the respective project maintainers.
(f) No, on the OpenCL branch, this is not true. Default here is USE_GREENTEA=ON, USE_CUDA=OFF, CPU_ONLY=OFF.

I will update the Readme after Christmas when I have holidays. I unfortunately don't have time for a detailed step-by-step right now.
CC: @willyd

@atlury
Copy link
Author

atlury commented Dec 16, 2016

@naibaf7
Thanks for the quick response. What about Linux Instructions?

Is OpenCL BLAS and ISAAC still needed??
https://github.com/01org/caffe/wiki/clCaffe

@naibaf7
Copy link
Member

naibaf7 commented Dec 16, 2016

@atlury
Two ways on Linux: Use CMAKE and use 'make all -j8' or copy the makefile.config.example to makefile.config and compile using make all -j8; make pycaffe -j8; make runtest -j8.
Note that the compiled results from Makefile and CMAKE are slightly different on Linux. The Makefile is older, but easier, and the CMAKE is more complex.

This branch is not the same as https://github.com/01org/caffe/wiki/clCaffe
therefore it has different requirements. However the Intel spatial kernels from there have been merged into this branch.

Strict requirements:

  • ViennaCL, OpenCL and normal Caffe requirements such as Gflags, HDF5, etc.
  • You can get the OpenCL SDK either with CUDA, AMD APP SDK or Intel OpenCL SDK. This is true for both Windows and Linux. Mac OS X should provide it's own OpenCL implementation.

Optional requirements:

  • clBLAS (from AMD)
  • CLBlast (from @CNugteren)
  • ISAAC
  • cuDNN
  • CUDA

@atlury
Copy link
Author

atlury commented Dec 16, 2016

Thanks @naibaf7
And also for linux, LIBDNN is for most nVidia and AMD chips only? And we should use Intel spatial for Intel iGPUs?

@naibaf7
Copy link
Member

naibaf7 commented Dec 16, 2016

@atlury
Intel spatial does not support efficient back propagation and not all shapes of convolutions, but yes, it is the fastest forward propagation on Intel iGPUs.
But I suggest you try both and check what works best for your networks and devices.

@atlury
Copy link
Author

atlury commented Dec 29, 2016

@naibaf7

Fabian, will the windows build support compiling with Mingw-64. Kindly let me know. If any instructions specific to it? Mocosoft studio is too bloated..

@naibaf7
Copy link
Member

naibaf7 commented Dec 29, 2016

@atlury Currently no, not that I am aware of. @willyd is the main contributor and maintainer of windows building, so maybe he can answer that.
While microsoft studio might be a bit bloated, it's quite convenient with it since @willyd precompiled all dependencies fro VS2015 and VS2013. So I imagine using mingw-64 is a lot more work.

@willyd
Copy link
Contributor

willyd commented Dec 29, 2016

I have no intention to support mingw-64 as CUDA does not support mingw as a host compiler on windows. That being said I welcome any PRs related to support mingw64 if they don't add too much complexity to the build.

@naibaf7
Copy link
Member

naibaf7 commented Dec 29, 2016

@willyd
Cool, what I thought. I am in this case in favor of simplicity, since Windows support without MinGW64 does not look like a major pitfall to me. It's somewhat preferable to use the standard compiler with each respective operating system.
I'm mostly worried about the support overhead when people use tricky build configurations.

@atlury
Copy link
Author

atlury commented Jan 8, 2017

@naibaf7

Does the windows opencl build include support for engine: SPATIAL? When I include engine: SPATIAL or engine: INTEL_SPATIAL, it get one of the following errors

Layer conv1 has unknown engine.
Error parsing text-format caffe.NetParameter: 18:3: Unknown enumeration value of "SPATIAL" for field "engine".

The wiki is confusing read.me https://github.com/BVLC/caffe/tree/opencl

It mentions both add entry engine: SPATIAL to all convolution layer specification. as well as "engine: INTEL_SPATIAL <-------------------------- this line!"

Which one?

And it runs fine without the engine: spatial in prototxt.

opencl-caffe-test.exe imagenet_deploy.prototxt bvlc_reference_caffenet.caffemodel imagenet_mean.binaryproto synset_words.txt truck.jpg
Use GPU with device ID 0
---------- Prediction for truck.jpg ----------
0.9872 - "n03417042 garbage truck, dustcart"
0.0110 - "n04467665 trailer truck, tractor trailer, trucking rig, rig, articulated lorry, semi"
0.0013 - "n03496892 harvester, reaper"
0.0002 - "n04428191 thresher, thrasher, threshing machine"
0.0001 - "n04461696 tow truck, tow car, wrecker"

Also here are a few "other" observations
a) Works better when compiled as DLL instead of static. Especially solves the error "Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type" (visual studio 2015)
b) It doesnt seem to pick up the OpenCL.lib so work around is to manually copy it from the opencl-sdk folder into the build folder (what does it expect the path variable name to be?)
c) The libraries extracted in the build folder could be compiled to latest (say for example opencv 3.2 etc)

@atlury
Copy link
Author

atlury commented Jan 8, 2017

Further

C:\Downloads\xxx.caffe-opencl-build\bin>caffe device_query
I0108 12:35:04.885713 19872 common.cpp:382] Total devices: 3

I0108 12:35:04.888244 19872 common.cpp:383] CUDA devices: 0
I0108 12:35:04.889102 19872 common.cpp:384] OpenCL devices: 3

I0108 12:35:04.889681 19872 common.cpp:408] Device id: 0

I0108 12:35:04.890744 19872 common.cpp:410] Device backend: OpenCL
I0108 12:35:04.891839 19872 common.cpp:412] Backend details: Intel(R) Corporation: OpenCL 1.2
I0108 12:35:04.893450 19872 common.cpp:414] Device vendor: Intel(R) Corporation
I0108 12:35:04.894731 19872 common.cpp:416] Name: Intel(R) HD Graphics 4400
I0108 12:35:04.895730 19872 common.cpp:418] Total global memory: 1708759450

I0108 12:35:04.897233 19872 common.cpp:408] Device id: 1

I0108 12:35:04.898505 19872 common.cpp:410] Device backend: OpenCL
I0108 12:35:04.899590 19872 common.cpp:412] Backend details: Intel(R) Corporation: OpenCL 1.2
I0108 12:35:04.901091 19872 common.cpp:414] Device vendor: Intel(R) Corporation
I0108 12:35:04.902592 19872 common.cpp:416] Name: Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz
I0108 12:35:04.904093 19872 common.cpp:418] Total global memory: 8513761280

I0108 12:35:04.905594 19872 common.cpp:408] Device id: 2

I0108 12:35:04.907114 19872 common.cpp:410] Device backend: OpenCL
I0108 12:35:04.908617 19872 common.cpp:412] Backend details: Intel(R) Corporation: OpenCL 2.1
I0108 12:35:04.910100 19872 common.cpp:414] Device vendor: Intel(R) Corporation
I0108 12:35:04.911598 19872 common.cpp:416] Name: Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz
I0108 12:35:04.913100 19872 common.cpp:418] Total global memory: 8513761280

@naibaf7
Copy link
Member

naibaf7 commented Jan 8, 2017

Looks good to me, although it seems you have both a newer OpenCL 2.1 and an older OpenCL 1.2 installed. As it's still a Haswell CPU I am not sure if Intel already has a 2.1/2.0 driver for your chip. But you should try to update your OpenCL SDK for your GPU.

Anyways, if you want to use INTEL_SPATIAL you need to also enable it at compile time. After that it becomes the standard engine on Intel GPU devices.
You can do that here:
https://github.com/BVLC/caffe/blob/opencl/scripts/build_win.cmd#L82
(scripts/build_win.cmd, line 82)

however the Intel spatial kernel has not been thoroughly tested on Windows yet.

@atlury
Copy link
Author

atlury commented Jan 8, 2017

I will try to update opencl sdk and i just saw your commits, will try to it enable, recompile and test them and report back.
Thanks

@atlury
Copy link
Author

atlury commented Jan 8, 2017

Okie with if NOT DEFINED USE_INTEL_SPATIAL set USE_INTEL_SPATIAL=1

Build_win.cmd throws the following error.

C:\Downloads\caffe-opencl\build\ALL_BUILD.vcxproj" (default target) (1) ->
C:\Downloads\caffe-opencl\build\src\caffe\caffe.vcxproj" (default target) (3) ->

(ClCompile target) -> C:\Downloads\caffe-opencl\src\caffe\layers\conv_layer_spatial.cpp(1453): error C2572: 'caffe::ConvolutionLayerSpatial::swizzleWeights': redefinition of default argument: parameter 1 [C:\Downloads\caffe-opencl\build\src\caffe\caffe.vcxproj]

C:\Downloads\caffe-opencl\src\caffe\layers\conv_layer_spatial.cpp(1458): error C2572: 'caffe::ConvolutionLayerSpatial::swizzleWeights': redefinition of default argument: parameter 1 [C:\Downloads\caffe-opencl\build\src\caffe\caffe.vcxproj]

@naibaf7
Copy link
Member

naibaf7 commented Jan 8, 2017

Ok, I'll look into that.
@gongzg for reference.

@gfursin
Copy link

gfursin commented Jan 18, 2017

Hi all,
Thank you for great work!
I managed to compile and run caffe-opencl on Windows and Intel HD 4400 with USE_INTEL_SPATIAL=0 (caffe time is sadly around 2x slower than running caffe-cpu on 2-core i5-4210U, unless I am doing something wrong). However, when compiling with USE_INTEL_SPATIAL=1, I also get the same error as @atlury (and I believe I have the same hardware on my Lenovo X240). I am curious to see if using INTEL_SPATIAL will help make run caffe-opencl faster on this GPU than on a CPU ...

@naibaf7
Copy link
Member

naibaf7 commented Jan 18, 2017

@gfursin It should, by a large margin. LibDNN expects the GPU to have a different memory architecture than what Intel chips have, so it does not run optimally at the moment.
We're currently investigating how to fix the Intel kernels so that they work on Windows as well.

@gfursin
Copy link

gfursin commented Jan 18, 2017

Super! Thanks a lot!

@gfursin
Copy link

gfursin commented Jan 18, 2017

By the way, @atlury, when selecting device 1 and 2, "caffe time" crashed each time after around 10 seconds - did you have the same behavior? Thanks!

@atlury
Copy link
Author

atlury commented Jan 19, 2017

@gfursin No I did no run caffe time (I will try to and report). I was frustrated with windows and later shifted to Ubuntu 17.04. See my comment here on linux. It works with spatial and I get more than 30 fps (VGG) in linux. #5165

There is an Intel paper published here (clcaffe)
http://www.slideshare.net/IntelSoftware/clcaffe-unleashing-the-power-of-intel-graphics-for-deep-learning-acceleration

Where the following benchmarks (page 28 GT3 GPU) were supported using INTEL SPATIAL in convolution layers.
Alexnet - 290 Images/Second
GoogleNet - 77 Images/Second
VGGA - 55 Images/Second
Overfeat - 91 Images/Second

I really want to test out Object Detection (not just classification) as well using INTEL SPATIAL but there is no example as such anywhere. I doubt if the if the Caffe Layers are ready yet? @naibaf7 ?

@gongzg are there any source code for the above tests that we can try?

Further LiDNN has been made to work with tiny-dnn which is exciting (although not many pre-trained models in there). I also want to test out quantization and see how opencl can help there (8-bit, XNOR etc). Finally object detection in opencl in real time would be awesome!!! i hope @naibaf7 can thrown in some light.

@naibaf7
Copy link
Member

naibaf7 commented Jan 19, 2017

@atlury I'll get back to you next week regarding the more difficult questions.
Intel spatial automatically gets used when you compile with the option enabled.
For object segmentation and detection I suggest you read my ISBI 2016 paper and technical report. I have SK-Net and U-Net architectures described there that can do this very fast. AlexNet can be converted to such a SK-Net.
You need to use LibDNN though to keep memory usage low in SK/U-Net.

@atlury
Copy link
Author

atlury commented Jan 19, 2017

Wow I just read your paper...the concept of Strided kernels seems very impressive. Not hijack this thread but all these will eventually need to be tested in Opencl under windows but before that....

Is this a python only implementation? No c++? Are there any pre-trained models? Is this where the repo is https://github.com/naibaf7/PyGreentea/tree/master/examples ? Yes I am gonna use LibDNN...

@naibaf7
Copy link
Member

naibaf7 commented Jan 19, 2017

@atlury Yes the original interface was C++ but we switched to python. However if you want to provide the data through HDF5 or your own C++ interface that will work too. Just use the network generator codes that I provide in python to help you create the correct prototxt for SK/U-type networks.
Here's a slightly older but full technical report: https://arxiv.org/abs/1509.03371, it includes performance numbers before LibDNN was programmed.
We do not provide pre-trained models at this point since the datasets (EM classification) we use these on & our results are not yet published.

@gongzg
Copy link

gongzg commented Jan 20, 2017

@atlury Some of the benchmark data are measured by using the convnet-benchmarks and you can reproduce it at your platform. We don't have other examples to share publicly currently.

@gfursin
Copy link

gfursin commented Jan 23, 2017

@atlury - thanks a lot for references! I had many troubles installing and using OpenCL for Intel GPU on Ubuntu in the past (had to recompile Linux kernel), but maybe latest drivers will work ok - need to check that. By the way, in #5165 you have a snapshot of a webcam + Caffe classification with FPS measurements - may I ask you which program did you use for that? Thanks a lot!!!

@atlury
Copy link
Author

atlury commented Jan 23, 2017

@gfursin

Please do the following.

  1. Use http://cdimage.ubuntu.com/daily-live/current/

  2. Install opencl SDK and opencl Run time from (kernel patch is not required)
    https://software.intel.com/en-us/intel-opencl/download
    https://software.intel.com/en-us/articles/opencl-drivers

  3. Download https://github.com/BVLC/caffe/tree/opencl
    (a) Please compile with Viennacl, libdnn, intel spatial, opencv etc enabled. Please make a shared library. I dont enable python since I dont use it often.

  4. VGG caffemodel, prototxt
    Download
    http://www.robots.ox.ac.uk/~vgg/software/very_deep/caffe/VGG_ILSVRC_16_layers.caffemodel
    https://gist.githubusercontent.com/ksimonyan/211839e770f7b538e2d8/raw/0067c9b32f60362c74f4c445a080beed06b07eb3/VGG_ILSVRC_16_layers_deploy.prototxt

include engine: INTEL_SPATIAL for all convolutional layers in your deploy.proto

Get the synset_words.txt

  1. Test using this program
    https://gist.github.com/atlury/f65fd41eb805cc5f77f666a59e71eae2

Just make sure the input_dim is 1 (in your proto) and not 10 (you are only giving it one image at a time) with 3 channels and the resizing is automatic.
input_dim: 1
input_dim: 3
input_dim: 224
input_dim: 224

Any additional help buzz me on skype:atlury or gtalk:atlury

Please note that this will only work in linux and opencl support for windows is still being worked on by @naibaf7

@gfursin
Copy link

gfursin commented Jan 23, 2017

Thank you very much @atlury for all details - very much appreciated - I will test it soon! By the way, I started automating installation of Caffe on Windows (CPU and OpenCL mode) using Collective Knowledge Framework, but it still needs more testing: https://github.com/dividiti/ck-caffe
I am waiting for a feedback from my colleagues and if it works fine, we will make an official release in a couple of weeks (possibly with a support for Android devices too) ...

@naibaf7
Copy link
Member

naibaf7 commented Nov 6, 2017

@bxk-sonavex It will work, but not with the Intel convolutions, therefore non optimal performance.
At the moment I think you can't find that, but I am working on a solution.
Your problem has more to do with missing OpenCL headers. What OpenCL have you installed? The Intel SDK?

@bxk-sonavex
Copy link

bxk-sonavex commented Nov 7, 2017

@naibaf7 Yes, I am using Intel SDK v6.3. I found a workaround here (#5575) and it works for me. Now I got the opencl branch compiled. Further, I tested my build using the mnist example provided in the examples folder. When using CPU (by modifying lenet_solver.prototxt), the train_lenet ran without any problem and the final training accuracy is 0.9902, which is as expected.

I1107 13:53:43.139747 3512 solver.cpp:421] Test net output #0: accuracy = 0.9902
I1107 13:53:43.139747 3512 solver.cpp:421] Test net output #1: loss = 0.0277191 (* 1 = 0.0277191 loss)

However, when using GPU, I got "caffe.exe has stopped working" error message window and the accuracy is just 0.1009.

I1107 14:11:15.651798 7872 solver.cpp:421] Test net output #0: accuracy = 0.1009
I1107 14:11:15.651798 7872 solver.cpp:421] Test net output #1: loss = 87.31 (* 1 = 87.31 loss)

Could you give me some leads on what happened? How to solve it? Or is this the thing that @gongzg mentioned?

That may not help given that some parts of the Intel OpenCL implementation don't work on Windows. But working on it, as you know :)

The places I modified from the default build_win.cmd are

set WITH_NINJA=1 
set CMAKE_BUILD_SHARED_LIBS=1 
set PYTHON_VERSION=3 
set RUN_INSTALL=1

Should I set the USE_INTEL_SPATIAL?

@bxk-sonavex
Copy link

When set USE_INTEL_SPATIAL=1, the branch cannot be compiled. The error is

ninja: build stopped: subcommand failed.

@gongzg
Copy link

gongzg commented Nov 8, 2017

@naibaf7 The 01org version works fine on Windows now. I'm still busy on other things so I haven't got enough time to submit all fixes to this OpenCL branch. Will do that when I have some time in the near future. @bxk-sonavex You can try the 01org version following the wiki page, and if you met any problem with that, please let me know.

@bxk-sonavex
Copy link

bxk-sonavex commented Nov 8, 2017

@gongzg Thanks! Following the instruction on https://github.com/01org/caffe/wiki/clCaffe#windows-support-for-intel-gen-platform, I got the error message:

fatal error C1083: Cannot open include file: 'caffe/proto/caffe.pb.h': No such file or directory

FYI:
https://github.com/ptillet/isaac.git is only compatible with NVIDIA hardware and cannot even be compiled, so I clone the https://github.com/intel/isaac.

UPDATE:
Manually generated the files via

build\libraries\bin\protoc.exe src\caffe\proto\caffe.proto --cpp_out=.\

Supposedly, the files should be generated automatically.

Then I got the following error:

"C:\DL\clCaffe\build\src\caffe\test\runtest.vcxproj" (default target) (1) ->
(CustomBuild target) ->
  C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\V140\Microsoft.CppCommon.targets(171,5): error MSB6006: "cmd.exe" exited with code -1073741515. [C:\DL\clCaffe\build\src\caffe\test\runtest.vc
xproj]

    2345 Warning(s)
    1 Error(s)

Time Elapsed 00:03:55.08
ERROR: Tests failed

Disabled RUN_TESTS and building the third time...

@gongzg
Copy link

gongzg commented Nov 8, 2017

@bxk-sonavex It seems that it was already built successfully. You need to copy the dll files to the executable files's directory:
"
Please be noted that, after the building finished successfully, before you try to run the application, you need to copy the dl.dll (dlfcn) and isaac.dll (isaac) into the same directory or put them into a system directory.
"

@bxk-sonavex
Copy link

@gongzg Added the folders of the two dlls in the system path instead of copying them to the test folder. Now got another error, which looks pretty serious...

"C:\DL\clCaffe\build\src\caffe\test\runtest.vcxproj" (default target) (1) ->
(CustomBuild target) ->
  CUSTOMBUILD : Fatal error : Intel iGPU device found but doesn't support cl_intel_subgroups_short. [C:\DL\clCaffe\build\src\caffe\test\runtest.vcxproj]

    2333 Warning(s)
    1 Error(s)

Time Elapsed 00:05:41.97
ERROR: Tests failed

I am using Intel Iris Plus Graphics 650 and intel_sdk_for_opencl_setup_6.3.0.1904. Any thoughts and solution?

@gongzg
Copy link

gongzg commented Nov 8, 2017

@bxk-sonavex You need to update your Intel Graphics driver to the latest version.

@bxk-sonavex
Copy link

bxk-sonavex commented Nov 8, 2017

@gongzg Thanks, that solved the compiling error. When running the tests, I got a whole bunch of errors like (may not catch all of them)

C:\DL\clCaffe\src\caffe\test\test_argmax_layer.cpp(132): error : Expected: (bottom_data[i * dim + j]) <= (max_val), actual: -0.402832 vs -0

C:\DL\clCaffe\src\caffe\test\test_convolution_layer_spatial.cpp(735): error : The difference between top_data[i] and ref_top_data[i] is 1.8
077674604790599e+28, which exceeds delta, where [C:\DL\clCaffe\build\src\caffe\test\runtest.vcxproj]
  top_data[i] evaluates to -1.8077674604790599e+28,
  ref_top_data[i] evaluates to 7.1034564971923828, and
  delta evaluates to 9.9999997473787516e-05.

C:\DL\clCaffe\src\caffe\test\test_convolution_layer_spatial.cpp(735): error : The difference between top_data[i] and ref_top_data[i] is 1
.803808228419822e+28, which exceeds delta, where [C:\DL\clCaffe\build\src\caffe\test\runtest.vcxproj]

    2418 Warning(s)
    17672 Error(s)

Time Elapsed 00:10:25.65
ERROR: Tests failed

Should these errors be concerned?

Anyway, I am testing the build using the mnist example. It's extremely slow, even much much slower than the original Caffe using CPU. And there are some warnings (repeated several times)

warning: Linking two modules of different data layouts: '' is 'e-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024' whereas '<origin>' is 'e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024-n8:16:32:64'

warning: Linking two modules of different target triples: ' is 'spir64' whereas '<origin>' is 'vISA_64'

Any idea?

@atlury
Copy link
Author

atlury commented Nov 8, 2017

@bxk-sonavex

Why dont you work with running caffe in linux for the time being? Devs i guess are focused more on getting the FP16, INT8 code etc running smoothly especially naibaf7 (david).

Proper windows support will come eventually.

Just a suggestion though.

@bxk-sonavex
Copy link

@atlury I'd love to!!! But our system is Windows 10 + Intel Iris ... Have any idea on when the Windows support will come? Or, any other DL platform works (using GPU)?

@bxk-sonavex
Copy link

bxk-sonavex commented Nov 8, 2017

@gongzg Just want to update you with the performance
CPU: 7 minutes 33 seconds, accuracy = 0.9914
GPU: 29 minutes 34 seconds, accuracy = 0.8406

Wondering what is the performance on Linux. Then, I could have a basic idea on how much speed up using Intel GPU (OpenCL) vs CPU. Thanks!!

@atlury
Copy link
Author

atlury commented Nov 8, 2017

@bxk-sonavex

Ben did you enable the opencl kernels? Did you try using INTEL_SPATIAL?

@bxk-sonavex
Copy link

bxk-sonavex commented Nov 8, 2017

@atlury What do you mean "enable the opencl kernels"? Yes, I followed the instruction here (https://github.com/01org/caffe/wiki/clCaffe#how-to-build) and did "set USE_INTEL_SPATIAL=1" in command line (not directly modifying the build_win.cmd file).

UPDATE:
INFO: ============================================================
INFO: Summary:
INFO: ============================================================
INFO: MSVC_VERSION = 14
INFO: WITH_NINJA = 0
INFO: CMAKE_GENERATOR = "Visual Studio 14 2015 Win64"
INFO: CPU_ONLY = 0
INFO: USE_CUDA = 0
INFO: USE_CUDNN = 0
INFO: USE_GREENTEA = 1
INFO: USE_LIBDNN = 1
INFO: USE_OPENMP = 0
INFO: USE_INDEX64 =
INFO: USE_INTEL_SPATIAL = 1
INFO: USE_ISAAC = 1
INFO: CMAKE_CONFIG = Release
INFO: USE_NCCL = 0
INFO: CMAKE_BUILD_SHARED_LIBS = 0
INFO: PYTHON_VERSION = 2
INFO: BUILD_PYTHON = 0
INFO: BUILD_PYTHON_LAYER = 0
INFO: BUILD_MATLAB = 0
INFO: PYTHON_EXE = "python"
INFO: RUN_TESTS = 1
INFO: RUN_LINT = 0
INFO: RUN_INSTALL = 1
INFO: ============================================================

@atlury
Copy link
Author

atlury commented Nov 8, 2017

@bxk-sonavex

Ben you will need to include INTEL_SPATIAL for all convolutional layers in your deploy.proto. I have personally tested it in real time in linux.

#5165

"I have tested on an Intel tv stick, webcam using Intel Spatial kernels and using 19-layer vgg model. I am able to get real time classification and all under 3.5 watts"

Windows should also work.

@gongzg
Copy link

gongzg commented Nov 8, 2017

@bxk-sonavex for the issue on 01org version, please open an issue there. There are some test failures due to FP16 precision issue on those gradient test cases which is not critical. The performance is extremely slow which should be caused by the auto-tuning. It should be much faster when you run it again. You can firstly try to use the build/tools/caffe to measure the forward performance for AlexNet.

@gfursin
Copy link

gfursin commented Nov 10, 2017

By the way, I just noticed that @CNugteren released new 1.2.0 version of his autotuned CLBlast library a few days ago. I checked it and it seems to be working with Caffe on my Windows 10 Lenovo Laptop with old Intel 4400 GPU (as well as on Linux) - so it can be a nice addition to Caffe since previous CLBlast version was seg-faulting on Windows!

If you are interested, you can check the speed of Caffe with LibDNN and CLBlast for example on SqueezeDet as following (the same procedure on both Windows and Linux):

$ pip install ck
$ ck pull repo --url=https://github.com/dividiti/ck-caffe
$ ck install package:lib-caffe-bvlc-opencl-libdnn-clblast-universal-20171015

It will take some time since CK will attempt to detect your environment and compilers,
and will then rebuild all necessary dependencies on your machine.

After that you can just install SqueezeDet and run internal time:

$ ck install package:caffemodel-deepscale-squeezenet-1.1
$ ck run program:caffe --cmd_key=time_gpu

The first run can be a bit slow due to kernel compilation and caching so the second run will be much faster!

You can also benchmark image classification:

$ ck pull repo:ctuning-datasets-min
$ ck run program:caffe --cmd_key=classify

Not related to Intel but just a note that there seems to be a minor bug when compiling Caffe with CLBlast 1.2.0 for Android ARM64 using Android GCC 4.9.x ("to_string" not found in std class):

$ ck install package:lib-caffe-bvlc-opencl-libdnn-clblast-universal-20171015 --target_os=android21-arm64 --env.DISABLE_DEVICE_HOST_UNIFIED_MEMORY=ON
$ ck compile program:caffe-time-opencl --target_os=android21-arm64
$ ck run program:caffe-time-opencl --target_os=android21-arm64

Would be nice to fix it since CLBlast 1.1.0 works fine on Android... In such case, it will be working with Caffe across all platforms.

Hope it's of any help and have a good weekend!

@CNugteren
Copy link

there seems to be a minor bug when compiling Caffe with CLBlast 1.2.0 for Android ARM64 using Android GCC 4.9.x ("to_string" not found in std class):

Not sure whether you mean that this is a bug in CLBlast or in Caffe? In any case, CLBlast has this implemented in a special Android header. Perhaps that could be used within Caffe as well?

@gfursin
Copy link

gfursin commented Nov 11, 2017

@CNugteren - I just checked and the problem is not in CLBlast. I just forgot a patch in the CK which was fixing LibDNN for Android (so my fault). I have added it (https://github.com/dividiti/ck-caffe/blob/master/package/lib-caffe-bvlc-opencl-clblast-universal/patch.android/android.fgg.patch3) and it's now possible to compile Caffe with CLBlast and libDNN. I checked classification and benchmarking examples on my Samsung S7 - works fine. So sorry for this false alarm and thanks for releasing a new CLBlast - I can now use it in Caffe on Linux, Windows and Android.

@bxk-sonavex
Copy link

@gfursin Is this a version using CPU or GPU (OpenCL)? I thought it is saying that the OpenCL is not working on Windows yet (or at least not with Intel iGPU yet). What are you using on Windows?

@atlury
Copy link
Author

atlury commented Nov 12, 2017

@bxk-sonavex

Ben sorry for the delay in responding back. I was away.

To quote @naibaf7
"The convolution method ("engine") can alternatively be selected/overwritten in the network prototxt file"

Thus add entry "engine: INTEL_SPATIAL" to all convolution layer specification.

Take AlexNet as an example, edit the file say $CAFFE_ROOT/models/bvlc_alexnet/train_val.prototxt, and add the following line to make conv1 layer to be computed using spatial convolution. Likewise change other layers

 layer {
   name: "conv1"
   type: "Convolution"
   bottom: "data"
   top: "conv1"
   param {
     lr_mult: 1
     decay_mult: 1
   }
   param {
     lr_mult: 2
     decay_mult: 0
   }
   convolution_param {
     num_output: 96
     kernel_size: 11
     stride: 4
     engine: INTEL_SPATIAL 		<-------------------------- this line!
     weight_filler {
       type: "gaussian"
       std: 0.01
     }
     bias_filler {
       type: "constant"
       value: 0
     }
   }
 }

Edit: My bad I see you had opened another thread and seems to have progressed a bit more.

@gfursin
Copy link

gfursin commented Nov 12, 2017

@bxk-sonavex - I use Caffe OpenCL version (with libDNN and CLBlast) on Windows with old Intel 4400 GPU WITHOUT Intel Spatial - it seems to be working fine but it may be suboptimal. Here is the list of Caffe devices ("ck run program:caffe --cmd_key=query_gpu_opencl"):
output_caffe_opencl_devices.txt

Here is the output from image classification on Windows with above Caffe OpenCL version and GoogleNet:
output_caffe_opencl_image_classification.txt

I mostly check inference/object detection at this stage (we are trying to unify DNN installation, benchmarking and optimization across all possible platforms) so I didn't really stress other Caffe capabilities and models on Windows with OpenCL ...

I also just tried to compile Caffe OpenCL with Intel Spatial ON ("ck install package:lib-caffe-bvlc-opencl-libdnn-clblast-universal --env.USE_INTEL_SPATIAL=ON") and I observe the same 2 build errors as was reported earlier by @atlury):
output_caffe_build_error_with_intel_spatial.txt

@rachithayp
Copy link

Is there a build script available for Linux (Ubuntu 16.04) too?. I am getting errors when trying to compile

@atlury
Copy link
Author

atlury commented Jul 30, 2018

@rachithayp Follow the instructions carefully, it will work even on 18.0x series. We have tested it.

@gfursin
Copy link

gfursin commented Jul 30, 2018

Hi @rachithayp . Just a note that you likely need to patch kernel to make Intel OpenCL work on Ubuntu 16.04: https://github.com/dividiti/ck-caffe/wiki/Installation#Intel_CPUGPU_Linux .

I managed to build OpenCL branch of Caffe on my Ubuntu 18.04 (Lenovo T470p laptop with Intel GPU) without patching kernel and with the latest Intel OpenCL via CK some weeks ago:

$ sudo pip install ck

$ ck pull repo --url=https://github.com/ctuning/ck-caffe

$ ck install package:lib-caffe-bvlc-opencl-viennacl-universal --env.USE_INTEL_SPATIAL=ON --env.CAFFE_BUILD_PYTHON=ON

CK will attempt to detect your available compilers, OpenCL libraries and other dependencies, and will invoke cmake for Caffe. If the build is successful, you can check installation using CK virtual env:

$ ck show env
$ ck virtual env --tags=lib,caffe
> python
import caffe

You can also try a sample image classification as follows:

$ ck compile program:caffe-classification-opencl --speed
$ ck run program:caffe-classification-opencl

Good luck.

cc @ens-lg4 and @psyhtest ...

@rachithayp
Copy link

@atlury I was able to compile using the below cmake:
cmake .. -DUSE_CUDA=OFF -DBUILD_docs=0 -DOPENCL_LIBRARIES=<> -DOPENCL_INCLUDE_DIRS=<>

But trying to compile with INTEL_SPATIAL_ON is giving below errors:
cmake .. -DUSE_GREENTEA=ON -DUSE_CUDA=OFF -DUSE_INTEL_SPATIAL=ON -DBUILD_docs=0 -DOPENCL_LIBRARIES=<> -DOPENCL_INCLUDE_DIRS=<>

/home/intel/Documents/caffe_src/opencl_caffe/src/caffe/libdnn/libdnn_conv_spatial.cpp:19:1: error: ‘LibDNNConvSpatial’ does not name a type
LibDNNConvSpatial::LibDNNConvSpatial(LibDNNConvConfig config) {
^
/home/intel/Documents/caffe_src/opencl_caffe/src/caffe/libdnn/libdnn_conv_spatial.cpp:117:25: error: expected initializer before ‘<’ token
string LibDNNConvSpatial::generate_fw_defs() {

Any idea what could be wrong?. Also there is no include/caffe/greentea folder on the opencl branch, so I copied it from "https://github.com/01org/caffe".

@atlury
Copy link
Author

atlury commented Aug 3, 2018

@rachithayp
Can you try the instruction from the chapter below? Its an rough cut of the the installation chapter from our upcoming book on opencl caffe. Thank you @naibaf7

I hope it will throw some light and help you in your opencl caffe endeavors.

python-deep-learning-installation-chap.pdf

@InonS InonS mentioned this issue Oct 16, 2018
5 tasks
@mahaoyanghb
Copy link

@bxk-sonavex - I use Caffe OpenCL version (with libDNN and CLBlast) on Windows with old Intel 4400 GPU WITHOUT Intel Spatial - it seems to be working fine but it may be suboptimal. Here is the list of Caffe devices ("ck run program:caffe --cmd_key=query_gpu_opencl"):
output_caffe_opencl_devices.txt

Here is the output from image classification on Windows with above Caffe OpenCL version and GoogleNet:
output_caffe_opencl_image_classification.txt

I mostly check inference/object detection at this stage (we are trying to unify DNN installation, benchmarking and optimization across all possible platforms) so I didn't really stress other Caffe capabilities and models on Windows with OpenCL ...

I also just tried to compile Caffe OpenCL with Intel Spatial ON ("ck install package:lib-caffe-bvlc-opencl-libdnn-clblast-universal --env.USE_INTEL_SPATIAL=ON") and I observe the same 2 build errors as was reported earlier by @atlury):
output_caffe_build_error_with_intel_spatial.txt

do your HD 4400 run faster with caffe than CPU?
I compiled clCaffe, and run it on my HD 5500 , but it's 5 times slower than CPU(i3 5005U)
I don't know why.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests