The following code has as main objective to obtain video-action features using pretrained models from the PySlowFast framework. The code provided here is focused only on obtaining features using the library Decord.
The logic used for the extraction of features is generating an output prior to the head of each model arranged in the pySlowFast framework. In this way we obtain for each architecture a temporal component referring to each time segment.
If you want to use the code read the "installation" and "How to use" section. For the execution of the script it is necessary to set/define in the configuration file some relevant inputs for each model.
Pretrained models with different sampling rates are used to obtain the features. In this way, the frames are iteratively traversed respecting the frame rate of each of the models. For those videos that have a number of frames with a multiplicity different from the frame rate of the models, the last bucket is filled with the random sampling of the last frames to obtain the temporal information.
To install and run the current code, you must install the pySlowFast framework. In other hand, you must install:
pip install scipy
pip install moviepy
pip install decord
Note: Sometimes moviepy
may give some problems to execute the code, in that case please try this:
pip uninstall moviepy
pip install moviepy
To execute the code see the following instructions, in HOWTOUSE.md you will find the execution script for each supported model (see supported models here) and in checkpoints you will find the different models pretrained by Meta.
To load weights for Resnet, SlowFast and MViT models, use the following weights.