You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your excellent work! How does the model get the box of a certain phrase in a sentence? Right now it seems to me that the model can't do that. Is that right?
The text was updated successfully, but these errors were encountered:
I think the box is annotated to each phrase in Flickr30K Entities data. As said in your paper, "Flickr30K Entities [38] augments the original Flickr30K [58] with short region phrase correspondence annotations."
Maybe the 'Flickr' dataset you use is one box annotation per sentence. Is that right?:)
Just as you cited, "Flickr30K Entities [38] augments the original Flickr30K [58] with short region phrase correspondence annotations." which means the original sentences of Flickr30K are splited to short phrases and each phrase is annotated with a bbox. When training on Flickr30K Entities, each sample is consists of a phrase and a bbox.
Thank you for your excellent work! How does the model get the box of a certain phrase in a sentence? Right now it seems to me that the model can't do that. Is that right?
The text was updated successfully, but these errors were encountered: