Training data for NYU dataset #12

fanglinpu · 2018-08-03T03:12:17Z

I find that you use all three views of depth images for training in the \denseReg-master\data\nyu.py .
def loadAnnotation(self, is_trun=False):
'''is_trun:
True: to load 14 joints from self.keep_list
False: to load all joints
'''
t1 = time.time()
path = os.path.join(self.src_dir, 'joint_data.mat')
mat = sio.loadmat(path)
camera_num = 1 if self.subset=='testing' else 3
joints = [mat['joint_xyz'][idx] for idx in range(camera_num)]
names = [['depth_{}_{:07d}.png'.format(camera_idx+1, idx+1) for idx in range(len(joints[camera_idx]))] for camera_idx in range(camera_num)]

But for fair comparison only view 1 images should be used for training.

melonwan · 2018-08-03T04:20:45Z

thanks a lot for pointing this out. Actually we've only used first view for training. See as another part of the code in dataset.py/nyu line 64, where only first 1/3 data are fed. I leave this interface available in case for ease of other usage.

fanglinpu · 2018-08-03T07:21:40Z

Thank you for your answer.

fanglinpu · 2018-08-03T12:37:51Z

I am a little bit confused about the depth normalization processing, why is the hand depth values range from com[2]-D_RANDGE to com[2]+D_RANGE*0.5? The corresponding code is as follows:

def norm_dm(dms, coms):
    def fn(elems):
        dm, com = elems[0], elems[1]
        max_depth = com[2]+D_RANGE*0.5
        min_depth = com[2]-D_RANGE*0.5
        mask = tf.logical_and(tf.less(dm, max_depth), tf.greater(dm, min_depth-D_RANGE*0.5))
        normed_dm = tf.where(mask, tf.divide(dm-min_depth, D_RANGE), -1.0*tf.ones_like(dm))
        return [normed_dm, com]

    norm_dms, _ = tf.map_fn(fn, [dms, coms])

    return norm_dms

I think the hand depth values should range from com[2]-D_RANGE0.5 to com[2]+D_RANGE0.5. But the provided code is as follows:
mask = tf.logical_and(tf.less(dm, max_depth), tf.greater(dm, min_depth-D_RANGE*0.5))

melonwan · 2018-08-05T00:49:58Z

these are just some trial and error hacky stuffs.

fanglinpu · 2018-08-06T01:51:35Z

For ICVL and MSRA datasets, the cropped image from testing set is also obtained by exploiting the ground truth pose, I think it is inappropriate.

melonwan · 2018-08-07T06:17:11Z

msra provides bbx as starting point. It is very easy to crop out hand from icvl with heuristics, eg depth thresholding.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training data for NYU dataset #12

Training data for NYU dataset #12

fanglinpu commented Aug 3, 2018

melonwan commented Aug 3, 2018

fanglinpu commented Aug 3, 2018

fanglinpu commented Aug 3, 2018 •

edited

Loading

melonwan commented Aug 5, 2018

fanglinpu commented Aug 6, 2018

melonwan commented Aug 7, 2018

Training data for NYU dataset #12

Training data for NYU dataset #12

Comments

fanglinpu commented Aug 3, 2018

melonwan commented Aug 3, 2018

fanglinpu commented Aug 3, 2018

fanglinpu commented Aug 3, 2018 • edited Loading

melonwan commented Aug 5, 2018

fanglinpu commented Aug 6, 2018

melonwan commented Aug 7, 2018

fanglinpu commented Aug 3, 2018 •

edited

Loading