-
Notifications
You must be signed in to change notification settings - Fork 6.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add product recommendation for automl tables notebook (#2257)
* added colab filtering notebook * update to tables client * update readme * tell user to restart kernel for automl
- Loading branch information
1 parent
7b529ee
commit 851525c
Showing
2 changed files
with
869 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
# Product Recommendation with AutoML Tables | ||
[AutoML Tables](https://cloud.google.com/automl-tables/) is a service for automating data proprocessing, model selection and training, and prediction for structured data. This tutorial demonstrates how AutoML Tables can be used to create product recommendations for users given a history of past user-product interactions. | ||
|
||
## Problem | ||
For online retailers, one key problem to solve is how to get the right products in front of customers to lead to a conversion. Often, these retailers will have huge product catalogs and a diverse pool of users. Additionally, it's typical for there to be plenty of noisy implicit feedback, and comparitively little explicit feedback. For example, in this notebook we will demonstrate how recommendations can be made to thousands of users from a catalog containing millions of songs. Although there is no information about users explicitly liking songs, the dataset does log every time a user listens to a song. | ||
|
||
## Approach | ||
A very common approach to solving product recommendation problems is to use matrix factorization (MF) as seen [in this solution](https://cloud.google.com/solutions/machine-learning/recommendation-system-tensorflow-overview). At a high level, MF is generally accomplished by creating a user-by-item matrix where each value is some sort of signal for similarity, such as a rating or view count, between the user and item if the pairing exists in the dataset. Depending on the approach, a number of matrices are then learned such that their product has similar values to the original matrix where pairs exist, and the values of unseen user-item pairs can be interpretted as predicted similarity scores. Although MF as it has been described cannot be done using AutoML tables, there is [literature](https://arxiv.org/abs/1708.05031) that argues that an equivalent does exist for deep learning. Better yet, this deep learning approach allows user and item features to be included in model training! | ||
|
||
In this notebook, we use AutoML Tables to train a binary classification model that takes user features and item features from a `(user, item)` pair as input, and outputs a predicted label and similarity score. The label for a sample is 1 if a user has listened to the song more than twice. Once this model is trained, we show how it can be used to construct a lookup table for user-item similarity by predicting a score for every possible pair, and how this table can be used to make recommendations for a user. | ||
|
||
### Alternative Approaches | ||
As the number of `(user, item)` pairs grows exponentially with the number of unique users and items, this lookup table approach may not be optimal for extremely large datasets. One workaround would be to train a model that learns to embed users and songs in the same embedding space, and use a nearest-neighbors algorithm to get recommendations for users. Unfortunately, AutoML Tables does not expose any feature for training and using embeddings, so a [custom ML model](https://github.com/GoogleCloudPlatform/professional-services/tree/master/examples/cloudml-collaborative-filtering) would need to be used instead. | ||
|
||
Another recommendation approach that is worth mentioning is [using extreme multiclass classification](https://ai.google/research/pubs/pub45530), as that also circumvents storing every possible pair of users and songs. Unfortunately, AutoML Tables does not support the multiclass classification of more than [100 classes](https://cloud.google.com/automl-tables/docs/prepare#target-requirements). | ||
|
Oops, something went wrong.