Skip to content

Implements behavioral segmentation of 3,999 airline customers using K-Means and Hierarchical Clustering, and explores possible target marketing options.

Notifications You must be signed in to change notification settings

imbottlebird/customer_segmentation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Customer Segmentation

  1. Data exploration and preprocessing
  2. K-Means clustering
  3. Hierarchical clustering
  4. Potential targeted marketing based on clusters

1. Data exploration and preprocessing

Data on 3,999 customers obtained from the loyalty program of a former airline.

Six numerical values describing customers:

  • Balance: Number of miles eligible for award travel
  • BonusTrans: Number of non-flight bonus transactions in the past 12 months
  • BonusMiles: Number of miles earned from those transactions
  • FlightTrans: Number of flight transactions
  • FlightMiles: Number of miles earned from those transactions
  • DaysSinceEnroll: Tenure in the program (days)

Data preprocessing (scaling)

  • First, 'center' the data by substracting the mean to each column (mean becomes 0 for each column)
  • Then, 'scale' the data by dividing by the standard deviation (standard deviation becomes 1 for each column)
#step 1: create the pre-processor using preProcess
# normalization for each col: (X_i-mean)/std
pp <- preProcess(airline, method=c("center", "scale"))
class(pp)
pp
pp$mean

#step 2: apply it to the dataset
airline.scaled <- predict(pp, airline)

# Sanity check
# mean is (approximately) 0 for all columns.
colMeans(airline)
colMeans(airline.scaled)
apply(airline.scaled,2,sd)
Original
Scaled

2. K-Means clustering

K-means has a random start where the centroids are initially randomly located. Then iterate the following two steps, until convergence:

  • Assign each observation to the nearest centroid
  • Recalculate centroids as average of assigned observations
# The kmeans function creates the clusters
# set the number of k=8
km <- kmeans(airline.scaled, centers = 8, iter.max=100) 
# centers randomly selected from rows of airline.scaled

class(km) # class: kmeans
names(km)

# cluster centroids. Store this result
km.centroids <- km$centers
km.centroids
# cluster for each point. Store this result.
km.clusters <- km$cluster
km.clusters

# the sum of the squared distances of each observation from its cluster centroid => cluster dissimilarity
km$tot.withinss  # cluster dissimilarity

# the number of observations in each cluster
km.size <- km$size
km.size

Scree plot

For k-means, try many value of k and look at their dissimilarity; here, let's test all k from 1 to 100.

k.data <- data.frame(k = 1:100)
k.data$SS <- sapply(k.data$k, function(k) {
  kmeans(airline.scaled, iter.max=100, k)$tot.withinss
})

# Plot the scree plot.
plot(k.data$k, k.data$SS, type="l")
plot(k.data$k, k.data$SS, type="l", xlim=c(0,40))
axis(side = 1, at = 1:10)

Let's zoom on the smallest k values (1-40) to take a closer look.

To select a "good" k value, pick something that defines the corner / pivot in the L, where there is a change of slope from steep to shallow (an elbow). Here, k=8 seems to be a good pick.



3. Hierarchical Clustering

Hierarchical clustering doesn't require to pre-specify the number of clusters as required by K-means. It also has an advantage over k-means that it visualizes the clustering process into a tree-based representation called a dendrogram.

Hierarchical clustering computes all-pair euclidian distances between the observations. It initially starts with as many clusters as data points (here, 3,999), then iteratively combine the pair of clusters with the smallest dissimilarity ("close" to each other) until the number of clusters goes down to 1.

d <- dist(airline.scaled) # method = "euclidean"
class(d)

# Creates the Hierarchical clustering
hclust.mod <- hclust(d, method="ward.D2")

# The "method=ward.D2" indicates the criterion to select the pair of clusters to be merged at each iteration

# Now, plot the hierarchy structure (dendrogram)
# labels=F (false) to not print text for each of the 3999 observations
plot(hclust.mod, labels=F, ylab="Dissimilarity", xlab = "", sub = "")

Scree Plot

Create the scree plot: dissimilarity for each k.

hc.dissim <- data.frame(k = seq_along(hclust.mod$height),   # index: 1,2,...,length(hclust.mod$height)
                        dissimilarity = rev(hclust.mod$height)) # reverse elements
head(hc.dissim)

# Scree plot
plot(hc.dissim$k, hc.dissim$dissimilarity, type="l")

# Let's zoom on the smallest k values:
plot(hc.dissim$k, hc.dissim$dissimilarity, type="l", xlim=c(0,40))
axis(side = 1, at = 1:10)

Based on the discussion above, k=7 seems to be a good pick.

# Improvement in dissimilarity for increasing number of clusters
hc.dissim.dif = head(hc.dissim,-1)-tail(hc.dissim,-1)
head(hc.dissim.dif,10)

# construct the clusters with k=7
h.clusters <- cutree(hclust.mod, 7)
h.clusters

# The *centroid* for a cluster is the mean value of all points in the cluster: 
aggregate(airline.scaled, by=list(h.clusters), mean) # Compute centroids

# *size* of each cluster
table(h.clusters)

# many zeros mean clusters from kmeans and hierarchical "match up"
table(h.clusters, km.clusters)

Visualization

We can visualize the clusters using fviz_cluster .

# install.packages("factoextra"), if not installed
library(factoextra)
# k-mean
fviz_cluster(km, data=airline.scaled, geom = "point", alpha=0.4)
# hclust
fviz_cluster(list(data = airline.scaled, cluster = h.clusters), geom="point", alpha=0.4)
K-means (k=8) Hierarchical clustering (k=7)


Potential target marketing options

K-Means Clusters

Original variables Clusters
1 2 3 4 5 6 7 8
Balance 61,201 57,207 127,761 91,719 31,165 168,964 566,040 168,897
BonusMiles 19,073 7,565 58,156 16,360 2,308 44,062 52,696 46,301
BonusTrans 17 8 21 17 3 33 19 43
FlightMiles 118 147 333 2,763 114 5,851 1133 14,244
FlightTrans 0 0 1 8 0 18 4 33
DaysSinceEnroll 2,923 6,074 5,484 3,964 2,300 5,157 6,312 3,446
Cluster Size 893 1,124 504 212 1,107 69 76 14

Option 1: Dormant customer

Cluster 2 and 5 are low-acitivity customers.
→ Provide promotional one-time events to incentivize new purchases.

Option 2: Point seeker

Customers in Cluster 1 and 3 focus on bonus transactions.
→ Provide target bonuses for flying; special offers for bonus transactions

Option 3: Old guard

Cluster 6 and 7 are long-lasting customers with moderate spending.
→ Provide thank-you gift or speical offers for loyalty.

Option 4: New oil

Cluster 4 and 8 are recent customers with very high spending. Should retain these customers.
→ Provide bonus miles, perks, etc.

About

Implements behavioral segmentation of 3,999 airline customers using K-Means and Hierarchical Clustering, and explores possible target marketing options.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages