Customer Segmentation Using K-Means in Python

KANGKAN KALITA
Customer Segmentation Using K-Means in Python

Customer Segmentation Using K-Means:

Customer segmentation is a key application of machine learning in marketing and business analytics. It allows companies to group customers based on shared characteristics, improving targeting strategies, personalization, and customer satisfaction.

In this tutorial, we will use K-Means clustering, an unsupervised machine learning algorithm, to segment customers based on their purchasing behavior.

Objectives:

  • Understand customer segmentation and its importance.
  • Preprocess and analyze customer data.
  • Implement K-Means clustering to create customer segments.
  • Visualize and interpret the results.

Dataset Description

For this project, we will use the Mall Customers Dataset. It contains customer details such as age, spending score, and income, which will help us cluster customers based on shopping patterns.

Columns:

  • CustomerID: Unique customer identifier.
  • Gender: Male/Female.
  • Age: Age of the customer.
  • Annual Income (k$): Yearly income of the customer in thousands.
  • Spending Score (1-100): Customer spending behavior score assigned by the mall.

Tools & Libraries

We will use the following Python libraries:

  • pandas, numpy (Data manipulation)
  • matplotlib, seaborn (Visualization)
  • scikit-learn (Machine learning implementation)

1. Data Collection & Setup

Importing Libraries and Loading Dataset

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# Load dataset
df = pd.read_csv("Mall_Customers.csv")

# Preview dataset
df.head()

Explanation:

  • We import the required libraries.
  • Load the dataset using pandas.read_csv().
  • Display the first five rows using head().

2. Exploratory Data Analysis (EDA)

Checking for Missing Values

print(df.isnull().sum())

Descriptive Statistics

print(df.describe())

Visualizing Customer Distribution

sns.pairplot(df[['Age', 'Annual Income (k$)', 'Spending Score (1-100)']])
plt.show()

Explanation:

  • We check for missing values.
  • Generate summary statistics using describe().
  • Use pair plots to visualize relationships between variables.

3. Data Preprocessing

Selecting Relevant Features

X = df[['Annual Income (k$)', 'Spending Score (1-100)']]

Feature Scaling

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

Explanation:

  • We select the Annual Income and Spending Score columns for clustering.
  • Standardize the data using StandardScaler() to normalize different scales.

4. Applying K-Means Clustering

Finding Optimal Number of Clusters Using Elbow Method

inertia = []
K = range(1, 11)
for k in K:
    kmeans = KMeans(n_clusters=k, init='k-means++', random_state=42)
    kmeans.fit(X_scaled)
    inertia.append(kmeans.inertia_)

plt.figure(figsize=(8, 5))
plt.plot(K, inertia, marker='o')
plt.xlabel('Number of Clusters')
plt.ylabel('Inertia')
plt.title('Elbow Method to Determine Optimal Clusters')
plt.show()

Explanation:

  • The elbow method helps determine the optimal number of clusters by plotting inertia (sum of squared distances from each point to its cluster center).

Training K-Means Model

kmeans = KMeans(n_clusters=5, init='k-means++', random_state=42)
kmeans.fit(X_scaled)
df['Cluster'] = kmeans.labels_

Explanation:

  • We fit the K-Means model with the optimal number of clusters.
  • Assign cluster labels to customers.

5. Visualizing Customer Segments

Scatter Plot of Customer Segments

plt.figure(figsize=(10, 6))
sns.scatterplot(x=X_scaled[:, 0], y=X_scaled[:, 1], hue=df['Cluster'], palette='viridis', s=100)
plt.xlabel('Annual Income')
plt.ylabel('Spending Score')
plt.title('Customer Segmentation using K-Means')
plt.show()

Explanation:

  • Visualizes the formed customer segments based on their income and spending behavior.

6. Conclusion

This Customer Segmentation Using K-Means in Python tutorial demonstrated:

  • Data exploration and preprocessing.
  • Using the Elbow Method to find the optimal number of clusters.
  • Training the K-Means algorithm for segmentation.
  • Visualizing customer segments.

Next Steps:

  • Try adding more features like Age.
  • Experiment with hierarchical clustering or DBSCAN.
  • Use PCA for dimensionality reduction before clustering.

Would love to hear your insights! Share your experiences or suggestions in the comments below.

Customer segmentation using K-Means in Python, Unsupervised learning, K-Means clustering, Machine learning for business.

Click Here To Explore More Projects From us

Latest Posts:

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *