Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

graphgen doesn't work with DataParallelTable #12

Open
szagoruyko opened this issue Apr 28, 2016 · 2 comments
Open

graphgen doesn't work with DataParallelTable #12

szagoruyko opened this issue Apr 28, 2016 · 2 comments

Comments

@szagoruyko
Copy link
Contributor

repro:

require 'cunn'
require 'cudnn'
local generateGraph = require 'optnet.graphgen'
local iterm = require 'iterm'

local model = nn.DataParallelTable(1)

model:add(cudnn.SpatialConvolution(3,96,7,7,3,3),1)
model:add(cudnn.SpatialConvolution(3,96,7,7,3,3),2)

model:cuda()

local input = torch.randn(32,3,224,224):cuda()

iterm.dot(generateGraph(model, input))

gives

/opt/rocks/distro/install/bin/luajit: /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:141: Unwritable object <userdata> at <?>.<?>.updateOutput.basefunc.errcheck.C
stack traceback:
    [C]: in function 'error'
    /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:141: in function 'writeObject'
    /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:235: in function 'writeObject'
    /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:235: in function 'writeObject'
    /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:200: in function 'writeObject'
    /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:235: in function 'writeObject'
    /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:235: in function 'writeObject'
    /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:200: in function 'writeObject'
    /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:235: in function 'writeObject'
    /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:235: in function 'writeObject'
    /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:200: in function 'writeObject'
    /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:235: in function 'writeObject'
    ...istro/install/share/lua/5.1/cudnn/SpatialConvolution.lua:470: in function 'write'
    /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:210: in function 'writeObject'
    /opt/rocks/distro/install/share/lua/5.1/nn/Module.lua:107: in function 'clone'
    .../distro/install/share/lua/5.1/cunn/DataParallelTable.lua:634: in function 'applyChanges'
    .../distro/install/share/lua/5.1/cunn/DataParallelTable.lua:472: in function 'apply'
    /opt/rocks/distro/install/share/lua/5.1/optnet/graphgen.lua:221: in function 'generateGraph'
    /tmp/graphgen_fail.lua:15: in main chunk
@fmassa
Copy link
Owner

fmassa commented Apr 28, 2016

Thanks for the example Sergey !
I managed to reduce the problem to the following snippet (independent of graphgen or cudnn):

require 'cunn'
model = nn.DataParallelTable(1)
model:add(nn.SpatialConvolution(3,96,7,7,3,3),1)
model:add(nn.SpatialConvolution(3,96,7,7,3,3),2)
model:cuda()
input = torch.randn(32,3,224,224):cuda()
function f(m)
  local ff = m.updateOutput
  m.updateOutput = function(self, i)
    return ff(self, i)
  end
end
model:apply(f)
model:forward(input);

This behaviour is not compatible with the other modules, where everything work as expected.
This seems like a bug in nn.DataParallelTable, or am I missing something ?

fmassa added a commit that referenced this issue Apr 29, 2016
Bypass DataParallelTable. Might not be the best solution, but seems to work
@fmassa
Copy link
Owner

fmassa commented Apr 29, 2016

@szagoruyko I proposed a quick fix for this issue in 0c7c216 . The test snippet you sent works. Could you check if it works for your models ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants