You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Not a fully-baked feature request, just a directional hunch. I've found the conclusions from this paper Sampling Matters in Deep Embedding Learning pretty intuitive -- (1) the method for choosing negative samples is critical to the overall embedding, maybe more than the specific loss function, and (2) a distance-weighted sampling of negatives had some nice properties during training and better results compared to uniform random sampling or oversampling hard cases.
I'm brand-new to Annoy, not confident on the implementation details or performance changes here, but I suspect that the prebuilt index could be used for both positive and negative sampling. An example: the current approach draws random negatives in sequence and chooses the first index not in a neighbor list. A distance-weighted approach for choosing a negative for each triplet might work like this:
Draw a random set of candidate negatives
Drop any candidate negatives already in the neighbor list
Choose from the remaining set of candidates with probabilities proportional to 1/f(dist(i, j)), where f(dist) could be just 1/dist, 1/sqrt(dist), etc
Annoy gives us the dist(i, j) without much of a performance hit. Weighted choice of the candidate negatives puts a (tunable) thumb on the scale for triplets that contain closer/harder-negative matches.
This idea probably does increase some hyperparameter selection headaches. I think the impactful choices here are the size of the initial set of candidate negatives and (especially) f(dist).
The text was updated successfully, but these errors were encountered:
Hi, thanks for the feedback and the link to the paper, it was a good read!
I like the idea and think it would be interesting to try out and see how it impacts the results of the embeddings. The idea of weighting the sampling of triplets did occur to me back when I was initially writing ivis, but I was thinking about weighting the selection of positive examples rather than negatives at that time, as it would be almost free. Weighting the selection of neighbors never made it past initial prototyping though, since it didn't have a positive impact.
From that paper you linked it does sound like weighting the sampling of negative examples may well the worth trying though. Hopefully the performance impact is negligible, since Annoy is pretty efficient, but will keep an eye on it. It would be nice to keep the number of hyperparameters low, but if there are clear benefits from weighted sampling it shouldn't be a problem to expand the number slightly.
Not a fully-baked feature request, just a directional hunch. I've found the conclusions from this paper Sampling Matters in Deep Embedding Learning pretty intuitive -- (1) the method for choosing negative samples is critical to the overall embedding, maybe more than the specific loss function, and (2) a distance-weighted sampling of negatives had some nice properties during training and better results compared to uniform random sampling or oversampling hard cases.
I'm brand-new to Annoy, not confident on the implementation details or performance changes here, but I suspect that the prebuilt index could be used for both positive and negative sampling. An example: the current approach draws random negatives in sequence and chooses the first index not in a neighbor list. A distance-weighted approach for choosing a negative for each triplet might work like this:
1/f(dist(i, j))
, wheref(dist)
could be just1/dist
,1/sqrt(dist)
, etcAnnoy gives us the
dist(i, j)
without much of a performance hit. Weighted choice of the candidate negatives puts a (tunable) thumb on the scale for triplets that contain closer/harder-negative matches.This idea probably does increase some hyperparameter selection headaches. I think the impactful choices here are the size of the initial set of candidate negatives and (especially)
f(dist)
.The text was updated successfully, but these errors were encountered: