Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kinetics pretrained models #1

Open
fmthoker opened this issue Aug 27, 2021 · 6 comments
Open

kinetics pretrained models #1

fmthoker opened this issue Aug 27, 2021 · 6 comments

Comments

@fmthoker
Copy link

Thanks for the code. Can you please share kinetics pretrained models. I want to include your paper in my experimental study.

@BestJuly
Copy link
Owner

Hi, @fmthoker.
Thank you for your interest.

I am busy with a lot of things for a long time. And I do not update this repo although we actually plan to make all codes & trained models public. Sorry for that.

We have reported two models (ResNet-18-3D and R(2+1)D-18-3D) pretrained on Kinetics400. As you requested, I will check the trained models in our server and update the Kinetics-400 pretrained model by the end of this week.

@fmthoker
Copy link
Author

Thanks for the quick response, sure I can understand that. Please make sure to upload the R(2+1)D-18-3D) pretrained on Kinetics400 as that is of our main interest at the moment.

@BestJuly
Copy link
Owner

BestJuly commented Aug 29, 2021

Hi, @fmthoker. I have uploaded the requested checkpoint to google drive, which is pretrained using our methods and can be used to downstream tasks by finetuning. Please note that the reported retreival accucacies in the paper are not based on this model, but you can still get good retrieval accuracies (in my exp, 45.4% at top1). For this part, you need to modify the retrieve_clips.py.

If you have other usage in your own experiments, also please pay attention to the checkpoint dict. You may need to write a function similar to this to adjust names.

There are two different kinds of R2plus1D in my codes and R(2+1)D-18-3D is defined here.

The current retrieve_clips.py and ft_classify.py still do not support R(2+1)D-18-3D. But the usage is similar to other network backbones and it would be easy to modify for experiments.

If you need further help, I am glad to share part of my code here. However, the update of this repo might be delayed because recently, I am afraid I could not find time to refactor codes for a published version.

@fmthoker
Copy link
Author

Thanks for sharing the models. I would try to adapt retrieve_clips.py and ft_classify.py for R(2+1)D-18-3D and get back to you in case of any problems. One question though, you mention that there are 2 R2plus1D models, is there a difference between the two?

@BestJuly
Copy link
Owner

Yes, the main differences lie to the model depth. R2plus1D means models with similar architecture to ResNet but using (2+1)D instead of 3D. R3D and R21D were mentioned in some previous works such as VCP/VCOP/etc., where the network depth is shallower compared to ResNet-18-3D (in some papers, they call 3D-ResNet-18). The layer numbers for each block of ResNet-18 is (2,2,2,2) while in R3D and R21D, these numbers are (1,1,1,1). You can check the layer sizes here.

To use ResNet-18-3D and R2Plus1D, a convenient way is to directly use models provided by torchvision with pretrained=False for self-supervised learning. Therefore, the network archtecture of my provided model weights are defined in network.py, NOT in r21d.py.

@fmthoker
Copy link
Author

Thanks for your help so far, I managed to run and use your R2Plus1D model. Can you please share the Kinetics400 pre-trained ResNet-18-3D too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants