You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, that's a good question. This part is for selecting the relevant features on the channel dimension while the previous selection is on the spatial dimension. We find selecting on both dimensions can enhance the effect of top down attention.
Hi, thanks a lot for your great work, and about top_down_transform I have some questions.
Here, why we use top_down_transform to multiply with masked_x again, because we have already got the selected feature.
top_down_transform = prompt[..., None] @ prompt[..., None].transpose(-1, -2)
x = x @ top_down_transform * 5
The text was updated successfully, but these errors were encountered: