is it possible to use some user or item embeddings with this library? #23

hodaraad · 2018-06-29T18:19:37Z

The first paper says that you also tried adding additional embedding layer, but 1-Hot encoding resulted in better performance.

I wonder what kind of embedding you used for that experiment? I'm interested in 2 types of embeddings:
1- something similar to what LightFm uses for representing items by considering content-info about items to solve the cold-start problem for items instead of only representing items by their ids.
2- something similar to what TensorRec framework allows to implement which will be transforming original high dimensional vector of items to other linear or non-linear representations which will be of much lower dimensions and map similar items to similar points in the embedded space.

In particular, I'm wondering whether you had any memory/performance problems when dealing with those very big high-dimensional matrices as in the paper for video dataset, you had 330 thousand videos which will be a huge matrix when represented in 1-Hot encoding.

Thanks

gds123 · 2018-06-30T01:33:35Z

I use the 2nd embedding and got a recall 0.44 mrr 0.16
What you performance in your experiment?

hidasib · 2018-07-02T11:37:19Z

@hodaraad There is an option for using embedding before the GRU layer. You can (1) either use the embedding=X parameter in the constructor to define item embedding of size X; or (2) set constrained_embedding=True to fix the input embedding to the output representation. This latter method is described in the second paper about GRU4Rec (https://arxiv.org/abs/1706.03847). The constrained embedding can improve results over the embedding-less setup, but it depends on the dataset. With the datasets I used, the standard embedding always performed slightly worse than either the embedding-less or constrained embedding setup.

Both of these setting make the network learn the embedding during training along with the session dynamics. You can not use pretrained embedding with this code without some modifications. I experimented with the pretrained approach (using both content based and CF pretrained embeddings), but it didn't improve the model.

Regarding one-hot encoding and memory consumption: if you use one-hot encoding, you basically do an indexing, you NEVER store your data as a big matrix of one one value per row and a bunch of zeros. So there is no additional memory requirement for the one-hot encoding approach besides keeping a map from the original item IDs of your data to 1...N, which is nothing compared to the network itself (less than 1MB for 330K items). Moreover, even if you use embedding you still need to have the indexing to feed the network with inputs.

The memory bottleneck is always are the weight matrices that are indexed by the items (Wx of GRU if there is no embedding; the embedding matrix (E) if there is embedding; and the output weight matrix (Wy) in all cases). The size of Wx is n_items x (3*first_layer); of E is n_items x embedding; of Wy is n_items x last_layer. Since constrained embedding only uses Wy, it requires the least amount of memory. The standard (embedding-less) setting requires 4 times more (as there is Wx, assuming that the size of the first and the last layers are equal). The basic embedding version requires 2 times more (assuming that the size of the embedding is the same as the size of the first layer).

Assuming 500K items, 100 as the first/last layer and embedding size and 32 bit floats, Wy uses up ~191MB of memory. During the training you also need a matrix of same size of the accumulated gradients (if you use adaptive learning methods, such as adagrad or adam) and another one for the velocity (if you use momentum). So constrained embedding uses ~572+X, standard setting uses ~2288+X, embedding setup uses ~1144+X megabytes of memory, where X is the rest of the network (negligible compared to the size of Wy), internal variables of Theano and the sample store used to speed up training on GPU (defined in the train function). The resulting model will be around 200 / 800 / 400 MB respectively.

Eliminating every item indexed matrix is possible in theory if the output of the network is not the predicted item, but the embedding of the predicted item (with L2 or cosine similarity loss on the output). However this approach is very inaccurate when it comes to top-N recommendation accuracy.

KunlinY · 2019-04-14T04:10:21Z

Both of these setting make the network learn the embedding during training along with the session dynamics. You can not use pretrained embedding with this code without some modifications. I experimented with the pretrained approach (using both content based and CF pretrained embeddings), but it didn't improve the model.

Hi, can you provide the version with pretrained approach?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

is it possible to use some user or item embeddings with this library? #23

is it possible to use some user or item embeddings with this library? #23

hodaraad commented Jun 29, 2018

gds123 commented Jun 30, 2018

hidasib commented Jul 2, 2018

KunlinY commented Apr 14, 2019

is it possible to use some user or item embeddings with this library? #23

is it possible to use some user or item embeddings with this library? #23

Comments

hodaraad commented Jun 29, 2018

gds123 commented Jun 30, 2018

hidasib commented Jul 2, 2018

KunlinY commented Apr 14, 2019