Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we futher improve autoalim without gate? #17

Open
twmht opened this issue Mar 19, 2022 · 3 comments
Open

Can we futher improve autoalim without gate? #17

twmht opened this issue Mar 19, 2022 · 3 comments

Comments

@twmht
Copy link

twmht commented Mar 19, 2022

It is not easy to deploy gate operator with some other backends, like TensorRT.

So my question is can we futher improve autoalim without the dynamic gate when inference?Any ongoing work are doing this?

@twmht
Copy link
Author

twmht commented Mar 19, 2022

Or can we just remove the dynamic gate after training, and then run the geedy autoslim, and found a better result?

@changlin31
Copy link
Owner

Hi, @twmht

Our supernet does not contain dynamic gate, you can use Autoslim algorithm to find a most suitable subnetwork in the supernet. Please note that our routing space (or search space) is small (only 14 sub-networks) as we need to save BN statics for every sub-network. If you want to perform NAS (eg. Autoslim), you could enlarge the search space.

During our supernet training, we use in-place bootstrapping, which outperform the in-place distillation used in original Autoslim paper (by around 1~2%). So we expect searching in our supernet can lead to better result than Autoslim.

@twmht
Copy link
Author

twmht commented Mar 19, 2022

@changlin31

I am not sure if the performance can be boosted if I use dynamic gate for training, but inference without them.

For example, without the distillation, Have you ever compared the following two examples,

  1. training with dynamic gate, and remove the dynamic gate, and then searching from the supernet, get a subnet and get the accruacy1.

  2. training without dynamic gate, like normal slimmable network, and get a similar flops subnet like example1 and get the accuracy2.

I am curious if accuracy1 is better than accuracy2? if it is, then I can conclude that gate training is helpful for boost the performance with slimmable network.

By the way, talking about the distillation for object detection, I am trying to train for feature map distilling, but it's not good. Maybe the feature map distilling dose not make sense for slimmable network, since the weights are shared for all subnet and supernet. So I am wondering how you do distilling for object detection?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants