Confusion between the --gpu flag and solverproto #925

Yangqing · 2014-08-14T03:07:06Z

Currently in tools/caffe we can use --gpu to specify GPUs. However, during training, this flag is not used and instead the gpu id in the solver protobuffer is used. This might be quite confusing to the user and we should probably consider moving the training mode and training gpu id options outside the solver prototxt, or overriding it with the commandline option.

Yangqing · 2014-08-14T03:15:51Z

(Thanks @yosinski for raising this in caffe-users!)

shelhamer · 2014-08-14T03:38:03Z

Oh, my intention was to override the solver configuration when the --gpu
arg is given. I must have overlooked it.

On Wednesday, August 13, 2014, Yangqing Jia [email protected]
wrote:

(Thanks @yosinski https://github.com/yosinski for raising this in
caffe-users!)

—
Reply to this email directly or view it on GitHub
#925 (comment).

Evan Shelhamer

Yangqing · 2014-08-14T04:01:44Z

yeah, the code is not there yet, although it is trivial to do - we just need to change the train() function so that the solver mode and gpu are rewritten.

In fact, I am thinking of marking the two solver flags as deprecated:

https://developers.google.com/protocol-buffers/docs/proto#options

so that we solely rely on the runtime to determine the mode rather than specifying things in the solver.

longjon · 2014-08-14T05:45:08Z

Yeah, I'm in favor of mode and device being specified on the command line only -- I was just irritated by this, and it's nice to have the solver prototxt be simply "the parameters needed for training", which are independent of device.

yosinski · 2014-08-14T06:30:31Z

@longjon +1

sguada · 2014-08-14T07:32:22Z

I would prefer to keep device_id in prototxt and overwrite if needed, since
in the future one could define different device_id per layer for
parallelism.

Sergio

2014-08-14 8:30 GMT+02:00 Jason Yosinski [email protected]:

@longjon https://github.com/longjon +1

—
Reply to this email directly or view it on GitHub
#925 (comment).

shelhamer · 2014-08-14T16:26:21Z

@sguada I agree we need a device_id field for the layers, but it isn't needed for the solver. I think it's alright to deprecate the solver message device_id. When we do multi-device parallelism I don't think we should hardcode an actual device ID into our model prototxt, but instead want to have the layers' device IDs refer to the order of the runtime args.

device: 0

in a prototxt would mean the first --gpu arg device ID.

Yangqing · 2014-08-14T17:31:47Z

@sguada @shelhamer

If I may be so bold - I would actually vote against having device_id in the layer definition. The reason is probably the same as that for removing the device_id in the solver proto: device_id is really a scheduling thing and not a mathematical thing, so we probably should have the layers proto defining the network architecture, the solver proto defining the solver math, and a separate way (either in the form of commandline arguments, or a third (oh my) proto) defining how training should be carried out on device (including multi GPUs). In this way we hopefully won't mix things up at multiple places :)

shelhamer · 2014-08-14T18:44:21Z

@Yangqing sure, I think we agree in spirit that the architecture and
parallelism should be defined distinctly. Whether they are both in the
model proto or a third "execution" proto doesn't inspire any strong
feelings for me.

I like your proposal for keeping it distinct, but it, but it may grow
complicated since the architecture and execution protos will have to refer
to each other to say where to do daa / model parallelism.

Details, details...

On Thursday, August 14, 2014, Yangqing Jia [email protected] wrote:

@sguada https://github.com/sguada @shelhamer
https://github.com/shelhamer

If I may be so bold - I would actually vote against having device_id in
the layer definition. The reason is probably the same as that for removing
the device_id in the solver proto: device_id is really a scheduling thing
and not a mathematical thing, so we probably should have the layers proto
defining the network architecture, the solver proto defining the solver
math, and a separate way (either in the form of commandline arguments, or a
third (oh my) proto) defining how training should be carried out on device
(including multi GPUs). In this way we hopefully won't mix things up at
multiple places :)

—
Reply to this email directly or view it on GitHub
#925 (comment).

shelhamer · 2014-08-21T21:23:57Z

Fixed by command line override in #961.

Yangqing added the enhancement label Aug 14, 2014

jeffdonahue mentioned this issue Aug 21, 2014

If specified, --gpu flag overrides SolverParameter solver_mode. #961

Merged

shelhamer closed this as completed Aug 21, 2014

longjon mentioned this issue Sep 15, 2014

Fix solver GPU initialization order (e.g., training with cuDNN on non-default device) #1083

Merged

longjon mentioned this issue Sep 25, 2014

Add set_mode() to solver #1154

Closed

longjon mentioned this issue Jan 21, 2015

solver_mode: CPU is ignored when gpu flag provided #1771

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusion between the --gpu flag and solverproto #925

Confusion between the --gpu flag and solverproto #925

Yangqing commented Aug 14, 2014

Yangqing commented Aug 14, 2014

shelhamer commented Aug 14, 2014

Yangqing commented Aug 14, 2014

longjon commented Aug 14, 2014

yosinski commented Aug 14, 2014

sguada commented Aug 14, 2014

shelhamer commented Aug 14, 2014

Yangqing commented Aug 14, 2014

shelhamer commented Aug 14, 2014

shelhamer commented Aug 21, 2014

Confusion between the --gpu flag and solverproto #925

Confusion between the --gpu flag and solverproto #925

Comments

Yangqing commented Aug 14, 2014

Yangqing commented Aug 14, 2014

shelhamer commented Aug 14, 2014

Yangqing commented Aug 14, 2014

longjon commented Aug 14, 2014

yosinski commented Aug 14, 2014

sguada commented Aug 14, 2014

shelhamer commented Aug 14, 2014

Yangqing commented Aug 14, 2014

shelhamer commented Aug 14, 2014

shelhamer commented Aug 21, 2014