Can I feed more than 3 channel to this model? like in (6, 244,244) instead of RGB data in (3,244,244) #4

xiaolongwu0713 · 2021-08-28T11:11:18Z

Hi,
I understand the visformer accept input in the 3D format as in (3,244,244), can I feed data with more than 3 RGB channel into the model? like in (16, 244,244)?

danczs · 2021-08-28T11:25:33Z

Yeah, by changing the input setting in the stem layer, a model can handle different channel numbers.
In visformer, you can change the stem setting from:
self.stem = nn.Sequential(
nn.Conv2d(3, self.init_channels, 7, stride=2, padding=3, bias=False),
BatchNorm(self.init_channels),
nn.ReLU(inplace=True)
)
to
self.stem = nn.Sequential(
nn.Conv2d(16, self.init_channels, 7, stride=2, padding=3, bias=False),
BatchNorm(self.init_channels),
nn.ReLU(inplace=True)
)
in model.py.

xiaolongwu0713 · 2021-08-28T22:45:21Z

Thanks for your swift reply.
I found there is also a visformer implementation in timm repo: https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/visformer.py. I hope there is no any discrepancy between you two? In timm's repo, I need to change the _cfg function to reflect the input channels, like below.
def _cfg(url='', **kwargs): return { 'url': url, 'num_classes': 1000, 'input_size': (16, 224, 224), 'pool_size': None, # 16 in the input channels 'crop_pct': .9, 'interpolation': 'bicubic', 'fixed_input_size': True, 'mean': IMAGENET_DEFAULT_MEAN, 'std': IMAGENET_DEFAULT_STD, 'first_conv': 'stem.0', 'classifier': 'head', **kwargs }
Can you verify that this can achieve the same result as your proposal? otherwise the timm repo might need to upgrade their repo.

danczs · 2021-08-29T10:15:56Z

This _cfg funchtion should work well. I checked the repo and did not find any difference. I will try to train the models in timm's repo and report the results here.
Thanks for your suggestion!

xiaolongwu0713 · 2021-08-29T23:31:25Z

Thank you for the time you put into this. Looking forward to your testing reslt.

danczs · 2021-09-06T03:26:03Z

We trained the timm-visformer-S and got 82.154%, which is simliar with our new reported results 82.19% (average of 3 run). So there is basically no difference between the two models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I feed more than 3 channel to this model? like in (6, 244,244) instead of RGB data in (3,244,244) #4

Can I feed more than 3 channel to this model? like in (6, 244,244) instead of RGB data in (3,244,244) #4

xiaolongwu0713 commented Aug 28, 2021

danczs commented Aug 28, 2021

xiaolongwu0713 commented Aug 28, 2021

danczs commented Aug 29, 2021

xiaolongwu0713 commented Aug 29, 2021

danczs commented Sep 6, 2021

Can I feed more than 3 channel to this model? like in (6, 244,244) instead of RGB data in (3,244,244) #4

Can I feed more than 3 channel to this model? like in (6, 244,244) instead of RGB data in (3,244,244) #4

Comments

xiaolongwu0713 commented Aug 28, 2021

danczs commented Aug 28, 2021

xiaolongwu0713 commented Aug 28, 2021

danczs commented Aug 29, 2021

xiaolongwu0713 commented Aug 29, 2021

danczs commented Sep 6, 2021