add product recommendation for automl tables notebook (#2257)

* added colab filtering notebook * update to tables client * update readme * tell user to restart kernel for automl
GoogleCloudPlatform · Sep 18, 2019 · 851525c · 851525c
1 parent 7b529ee
commit 851525c
Show file tree

Hide file tree

Showing 2 changed files with 869 additions and 0 deletions.
diff --git a/tables/automl/notebooks/music_recommendation/README.md b/tables/automl/notebooks/music_recommendation/README.md
@@ -0,0 +1,16 @@
+# Product Recommendation with AutoML Tables
+[AutoML Tables](https://cloud.google.com/automl-tables/) is a service for automating data proprocessing, model selection and training, and prediction for structured data. This tutorial demonstrates how AutoML Tables can be used to create product recommendations for users given a history of past user-product interactions.
+
+## Problem
+For online retailers, one key problem to solve is how to get the right products in front of customers to lead to a conversion. Often, these retailers will have huge product catalogs and a diverse pool of users. Additionally, it's typical for there to be plenty of noisy implicit feedback, and comparitively little explicit feedback. For example, in this notebook we will demonstrate how recommendations can be made to thousands of users from a catalog containing millions of songs. Although there is no information about users explicitly liking songs, the dataset does log every time a user listens to a song. 
+
+## Approach
+A very common approach to solving product recommendation problems is to use matrix factorization (MF) as seen [in this solution](https://cloud.google.com/solutions/machine-learning/recommendation-system-tensorflow-overview). At a high level, MF is generally accomplished by creating a user-by-item matrix where each value is some sort of signal for similarity, such as a rating or view count, between the user and item if the pairing exists in the dataset. Depending on the approach, a number of matrices are then learned such that their product has similar values to the original matrix where pairs exist, and the values of unseen user-item pairs can be interpretted as predicted similarity scores. Although MF as it has been described cannot be done using AutoML tables, there is [literature](https://arxiv.org/abs/1708.05031) that argues that an equivalent does exist for deep learning. Better yet, this deep learning approach allows user and item features to be included in model training! 
+
+In this notebook, we use AutoML Tables to train a binary classification model that takes user features and item features from a `(user, item)` pair as input, and outputs a predicted label and similarity score. The label for a sample is 1 if a user has listened to the song more than twice. Once this model is trained, we show how it can be used to construct a lookup table for user-item similarity by predicting a score for every possible pair, and how this table can be used to make recommendations for a user.
+
+### Alternative Approaches
+As the number of `(user, item)` pairs grows exponentially with the number of unique users and items, this lookup table approach may not be optimal for extremely large datasets. One workaround would be to train a model that learns to embed users and songs in the same embedding space, and use a nearest-neighbors algorithm to get recommendations for users. Unfortunately, AutoML Tables does not expose any feature for training and using embeddings, so a [custom ML model](https://github.com/GoogleCloudPlatform/professional-services/tree/master/examples/cloudml-collaborative-filtering) would need to be used instead.
+
+Another recommendation approach that is worth mentioning is [using extreme multiclass classification](https://ai.google/research/pubs/pub45530), as that also circumvents storing every possible pair of users and songs. Unfortunately, AutoML Tables does not support the multiclass classification of more than [100 classes](https://cloud.google.com/automl-tables/docs/prepare#target-requirements).
+