Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I feed more than 3 channel to this model? like in (6, 244,244) instead of RGB data in (3,244,244) #4

Open
xiaolongwu0713 opened this issue Aug 28, 2021 · 5 comments

Comments

@xiaolongwu0713
Copy link

Hi,
I understand the visformer accept input in the 3D format as in (3,244,244), can I feed data with more than 3 RGB channel into the model? like in (16, 244,244)?

@danczs
Copy link
Owner

danczs commented Aug 28, 2021

Yeah, by changing the input setting in the stem layer, a model can handle different channel numbers.
In visformer, you can change the stem setting from:
self.stem = nn.Sequential(
nn.Conv2d(3, self.init_channels, 7, stride=2, padding=3, bias=False),
BatchNorm(self.init_channels),
nn.ReLU(inplace=True)
)

to
self.stem = nn.Sequential(
nn.Conv2d(16, self.init_channels, 7, stride=2, padding=3, bias=False),
BatchNorm(self.init_channels),
nn.ReLU(inplace=True)
)

in model.py.

@xiaolongwu0713
Copy link
Author

Thanks for your swift reply.
I found there is also a visformer implementation in timm repo: https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/visformer.py. I hope there is no any discrepancy between you two? In timm's repo, I need to change the _cfg function to reflect the input channels, like below.
def _cfg(url='', **kwargs): return { 'url': url, 'num_classes': 1000, 'input_size': (16, 224, 224), 'pool_size': None, # 16 in the input channels 'crop_pct': .9, 'interpolation': 'bicubic', 'fixed_input_size': True, 'mean': IMAGENET_DEFAULT_MEAN, 'std': IMAGENET_DEFAULT_STD, 'first_conv': 'stem.0', 'classifier': 'head', **kwargs }
Can you verify that this can achieve the same result as your proposal? otherwise the timm repo might need to upgrade their repo.

@danczs
Copy link
Owner

danczs commented Aug 29, 2021

This _cfg funchtion should work well. I checked the repo and did not find any difference. I will try to train the models in timm's repo and report the results here.
Thanks for your suggestion!

@xiaolongwu0713
Copy link
Author

Thank you for the time you put into this. Looking forward to your testing reslt.

@danczs
Copy link
Owner

danczs commented Sep 6, 2021

We trained the timm-visformer-S and got 82.154%, which is simliar with our new reported results 82.19% (average of 3 run). So there is basically no difference between the two models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants