You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, thank you for your great work. I noticed that in the open checkpoints, all checkpoints trained on video data have the compress type as 'mean' (or 'mean_concat', but I couldn't find the corresponding logic in the code). Are all video-based checkpoints, regardless of whether the training data is short or long videos, trained with 2 tokens?
The text was updated successfully, but these errors were encountered:
Hello, thank you for your great work. I noticed that in the open checkpoints, all checkpoints trained on video data have the compress type as 'mean' (or 'mean_concat', but I couldn't find the corresponding logic in the code). Are all video-based checkpoints, regardless of whether the training data is short or long videos, trained with 2 tokens?
The text was updated successfully, but these errors were encountered: