Channel softmax #940

ronghanghu · 2014-08-16T22:11:03Z

In this pull request, the behavior of SoftmaxLayer is changed from softmax over channels*height*width elements (all elements within a num) to softmax over channels elements (all elements at a spatial position within a num). This is for the purpose of running fully-connected layers as convolutions (see Net Surgery: http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/net_surgery.ipynb). It won't damage existing caffe examples, since fully-connected layer top blob has width==1 and height==1.

The CPU version was implemented by @longjon, and I implemented the GPU version, including GPU backward.

jeffdonahue · 2014-08-16T22:19:54Z

src/caffe/layers/softmax_layer.cu

  }
 }

 template <typename Dtype>
-__global__ void kernel_exp(const int num, const Dtype* data, Dtype* out) {
-  CUDA_KERNEL_LOOP(index, num) {
+__global__ void kernel_exp(const int count, const Dtype* data, Dtype* out) {


This kernel is the same as caffe_gpu_exp isn't it? Let's remove it and replace with caffe_gpu_exp, unless I'm misunderstanding somehow. (I know it wasn't added this PR, but I just noticed it from seeing the diff.)

I can't find caffe_gpu_exp. I only found caffe_exp, which calls vsExp in MKL.

whoops, my bad, I think I was thinking of caffe_gpu_powx. caffe_gpu_exp should probably exist but device abstraction (#610) will probably take care of this so never mind, sorry!

jeffdonahue · 2014-08-16T23:05:19Z

assigning to @longjon, go ahead and merge when you're happy with everything

longjon · 2014-08-17T04:08:58Z

@shelhamer suggested offline adding a switch to provide the original "normalize over everything" mode. So, @shelhamer, if you still want to do that, you can append to or rewrite this PR.

@shelhamer and others, which mode do we think should be the default? It seems like the channel normalization is usually what is desired, and I doubt anyone is relying on the current behavior, although it is a little jarring to change what layers do. If we do want the default to be the channel normalization, we could go ahead and merge this, and add a switch in a later PR.

ronghanghu · 2014-08-17T08:25:05Z

src/caffe/layers/softmax_layer.cpp

+    caffe_cpu_gemv<Dtype>(CblasTrans, channels, spatial_dim, 1,
+        bottom_diff + i * dim, sum_multiplier_.cpu_data(), 0, scale_data);
+    // restore the original top_diff in bottom_diff for subtraction
+    caffe_copy(dim, top_diff + i * dim, bottom_diff + i * dim);


Also note that the updated SoftmaxLayer CPU no longer allows in-place computations, since in CPU implementation bottom diff is first changed and then restored
caffe_mul(top[0]->count(), bottom_diff, top_data, bottom_diff);
while GPU implementation still allows in-place computations.

@jeffdonahue @shelhamer should we allow in-place computations in SoftmaxLayer?

Yeah, good catch. I did this to avoid an extra loop, but I've added it back now to allow in-place computation. There should be no performance regression in the 1x1 case, and probably not a noticeable one in the general case, and anyway the GPU implementation is available.

In order to do this, I had to add functions to math_functions for strided dot products (which of course cblas already supports, but we didn't previously have an interface for.)

shelhamer · 2014-08-17T15:39:02Z

@longjon merge as you please, as the switch can follow. I agree channel is
a reasonable default. Although it does change the default behavior, I
imagine anyone who has adopted the fully-convolutional models wants channel
softmax.

On Saturday, August 16, 2014, longjon [email protected] wrote:

@shelhamer https://github.com/shelhamer suggested offline adding a
switch to provide the original "normalize over everything" mode. So,
@shelhamer https://github.com/shelhamer, if you still want to do that,
you can append to or rewrite this PR.

@shelhamer https://github.com/shelhamer and others, which mode do we
think should be the default? It seems like the channel normalization is
usually what is desired, and I doubt anyone is relying on the current
behavior, although it is a little jarring to change what layers do. If we
do want the default to be the channel normalization, we could go ahead and
merge this, and add a switch in a later PR.

—
Reply to this email directly or view it on GitHub
#940 (comment).

This provides a more direct interface to the cblas_?dot functions. This is useful, for example, for taking dot products across channels.

longjon · 2014-08-18T23:18:21Z

@ronghanghu I amended your commit with some aesthetic changes (make all the channel kernels have the form kernel_channel_[word], fix some lint errors being masked by NOLINT, and fix capitalization in comments). I think this will be ready for merge once Travis passes. The GPU implementation is a little heavy in terms of introducing lots of kernels instead of calling gpu_gemm and so forth, but it does the right thing by parallelizing over both batch and spatial dims, so I'll take it. Thanks for getting this written!

Softmax works across channels

shelhamer · 2014-08-21T03:47:03Z

Fixed order of specialization and instantiation for clang++ build in ac64a7b. You can't call caffe_cpu_strided_dot() before its specializations as was done in caffe_cpu_dot().

Softmax works across channels

jeffdonahue reviewed Aug 16, 2014
View reviewed changes

jeffdonahue assigned longjon Aug 16, 2014

ronghanghu reviewed Aug 17, 2014
View reviewed changes

add caffe_cpu_strided_dot for strided dot products

e957bbc

This provides a more direct interface to the cblas_?dot functions. This is useful, for example, for taking dot products across channels.

longjon and others added 3 commits August 19, 2014 01:05

softmax and softmax loss layers work across channels

cf42598

test softmax and softmax with loss across channels

7f3ebcb

implement GPU version of Softmax

5528fa9

longjon added a commit that referenced this pull request Aug 19, 2014

Merge pull request #940 from ronghanghu/channel-softmax

78eea24

Softmax works across channels

longjon merged commit 78eea24 into BVLC:dev Aug 19, 2014

ronghanghu deleted the channel-softmax branch August 19, 2014 15:50

This was referenced Sep 18, 2014

[cancelled] Next #1109

Merged

Next: release candidate #1112

Merged

mitmul pushed a commit to mitmul/caffe that referenced this pull request Sep 30, 2014

Merge pull request BVLC#940 from ronghanghu/channel-softmax

2f42af4

Softmax works across channels

RazvanRanca pushed a commit to RazvanRanca/caffe that referenced this pull request Nov 4, 2014

Merge pull request BVLC#940 from ronghanghu/channel-softmax

fa08731

Softmax works across channels

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Channel softmax #940

Channel softmax #940

ronghanghu commented Aug 16, 2014

jeffdonahue Aug 16, 2014

ronghanghu Aug 16, 2014

jeffdonahue Aug 16, 2014

jeffdonahue commented Aug 16, 2014

longjon commented Aug 17, 2014

ronghanghu Aug 17, 2014

longjon Aug 18, 2014

shelhamer commented Aug 17, 2014

longjon commented Aug 18, 2014

shelhamer commented Aug 21, 2014

Channel softmax #940

Channel softmax #940

Conversation

ronghanghu commented Aug 16, 2014

jeffdonahue Aug 16, 2014

Choose a reason for hiding this comment

ronghanghu Aug 16, 2014

Choose a reason for hiding this comment

jeffdonahue Aug 16, 2014

Choose a reason for hiding this comment

jeffdonahue commented Aug 16, 2014

longjon commented Aug 17, 2014

ronghanghu Aug 17, 2014

Choose a reason for hiding this comment

longjon Aug 18, 2014

Choose a reason for hiding this comment

shelhamer commented Aug 17, 2014

longjon commented Aug 18, 2014

shelhamer commented Aug 21, 2014