-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
is it possible to use some user or item embeddings with this library? #23
Comments
I use the 2nd embedding and got a recall 0.44 mrr 0.16 |
@hodaraad There is an option for using embedding before the GRU layer. You can (1) either use the Both of these setting make the network learn the embedding during training along with the session dynamics. You can not use pretrained embedding with this code without some modifications. I experimented with the pretrained approach (using both content based and CF pretrained embeddings), but it didn't improve the model. Regarding one-hot encoding and memory consumption: if you use one-hot encoding, you basically do an indexing, you NEVER store your data as a big matrix of one one value per row and a bunch of zeros. So there is no additional memory requirement for the one-hot encoding approach besides keeping a map from the original item IDs of your data to 1...N, which is nothing compared to the network itself (less than 1MB for 330K items). Moreover, even if you use embedding you still need to have the indexing to feed the network with inputs. The memory bottleneck is always are the weight matrices that are indexed by the items (Wx of GRU if there is no embedding; the embedding matrix ( Assuming 500K items, 100 as the first/last layer and embedding size and 32 bit floats, Wy uses up ~191MB of memory. During the training you also need a matrix of same size of the accumulated gradients (if you use adaptive learning methods, such as adagrad or adam) and another one for the velocity (if you use momentum). So constrained embedding uses ~572+X, standard setting uses ~2288+X, embedding setup uses ~1144+X megabytes of memory, where X is the rest of the network (negligible compared to the size of Eliminating every item indexed matrix is possible in theory if the output of the network is not the predicted item, but the embedding of the predicted item (with L2 or cosine similarity loss on the output). However this approach is very inaccurate when it comes to top-N recommendation accuracy. |
Hi, can you provide the version with pretrained approach? |
The first paper says that you also tried adding additional embedding layer, but 1-Hot encoding resulted in better performance.
I wonder what kind of embedding you used for that experiment? I'm interested in 2 types of embeddings:
1- something similar to what LightFm uses for representing items by considering content-info about items to solve the cold-start problem for items instead of only representing items by their ids.
2- something similar to what TensorRec framework allows to implement which will be transforming original high dimensional vector of items to other linear or non-linear representations which will be of much lower dimensions and map similar items to similar points in the embedded space.
In particular, I'm wondering whether you had any memory/performance problems when dealing with those very big high-dimensional matrices as in the paper for video dataset, you had 330 thousand videos which will be a huge matrix when represented in 1-Hot encoding.
Thanks
The text was updated successfully, but these errors were encountered: