Implementation of our 2nd place solution to Booking.Com Data Challenge Competition. The aim of the challenge is to make the best recommendation for the next destination of a user trip, based on dataset with millions of real anonymized accommodation reservations. First, we use Cleora - our graph embedding method - to represent cities as a directed graph and learn their vector representation. Next, we apply EMDE to predict the next user destination based on previously visited cities and some features associated with each trip.
An example of the full EMDE algorithm run is depicted in the figure below:
- Download binary Cleora release. Then add execution permission to run it. Refer to cleora github webpage for more details about Cleora.
- Python 3.7
- Install requirments:
pip install -r requirements.txt
- GPU for training
-
Download training set (
booking_train_set.csv
) from challenge competition website -
Split dataset into training and validation set that imitates the hidden test set:
python train_valid_split.py --train booking_train_set.csv
This script will create three files:
data/valid.csv
,data/train.csv
anddata/ground_truth.csv
-
Compute city sketches using Cleora and EMDE
python encoding.py --train data/train.csv --test data/valid.csv
This script will create LSH codes for each city and save it into
data/codes
-
Prepare dataset. Converting it from csv format to dictionary. Example datapoint looks like:
{'checkout_last': Timestamp('2016-08-18 00:00:00'), 'checkin_last': Timestamp('2016-08-16 00:00:00'), 'day_in': 17, 'day_out': 20, 'month_in': 7, 'year_in': 0, 'checkout': Timestamp('2016-08-21 00:00:00'), 'checkin': Timestamp('2016-08-18 00:00:00'), 'cities': [8183, 15626, 60902], 'utrip_id': '1000027_1', 'device_class': 0, 'affiliate_id': 1, 'booker_country': 0, 'num_cities_in_trip': 4, 'hotel_country': 0, 'is_target_last_city': True, 'target': 30628}
python data_preprocessing.py --train data/train.csv --test data/valid.csv --ground-truth data/ground_truth.csv
-
Training model
python train.py