Customer Segmentation Using K-Means in Python
- Predicting House Prices using Machine Learning - April 10, 2025
- 10 Data Visualization Project Ideas with Source Code - April 9, 2025
- Music Recommendation System using Python – Full Project - April 7, 2025

Customer Segmentation Using K-Means:
Customer segmentation is a key application of machine learning in marketing and business analytics. It allows companies to group customers based on shared characteristics, improving targeting strategies, personalization, and customer satisfaction.
In this tutorial, we will use K-Means clustering, an unsupervised machine learning algorithm, to segment customers based on their purchasing behavior.
Objectives:
- Understand customer segmentation and its importance.
- Preprocess and analyze customer data.
- Implement K-Means clustering to create customer segments.
- Visualize and interpret the results.
Dataset Description
For this project, we will use the Mall Customers Dataset. It contains customer details such as age, spending score, and income, which will help us cluster customers based on shopping patterns.
Columns:
- CustomerID: Unique customer identifier.
- Gender: Male/Female.
- Age: Age of the customer.
- Annual Income (k$): Yearly income of the customer in thousands.
- Spending Score (1-100): Customer spending behavior score assigned by the mall.
Tools & Libraries
We will use the following Python libraries:
- pandas, numpy (Data manipulation)
- matplotlib, seaborn (Visualization)
- scikit-learn (Machine learning implementation)
1. Data Collection & Setup
Importing Libraries and Loading Dataset
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.cluster import KMeans from sklearn.preprocessing import StandardScaler # Load dataset df = pd.read_csv("Mall_Customers.csv") # Preview dataset df.head()
Explanation:
- We import the required libraries.
- Load the dataset using
pandas.read_csv()
. - Display the first five rows using
head()
.
2. Exploratory Data Analysis (EDA)
Checking for Missing Values
print(df.isnull().sum())
Descriptive Statistics
print(df.describe())
Visualizing Customer Distribution
sns.pairplot(df[['Age', 'Annual Income (k$)', 'Spending Score (1-100)']]) plt.show()
Explanation:
- We check for missing values.
- Generate summary statistics using
describe()
. - Use pair plots to visualize relationships between variables.
3. Data Preprocessing
Selecting Relevant Features
X = df[['Annual Income (k$)', 'Spending Score (1-100)']]
Feature Scaling
scaler = StandardScaler() X_scaled = scaler.fit_transform(X)
Explanation:
- We select the
Annual Income
andSpending Score
columns for clustering. - Standardize the data using
StandardScaler()
to normalize different scales.
4. Applying K-Means Clustering
Finding Optimal Number of Clusters Using Elbow Method
inertia = [] K = range(1, 11) for k in K: kmeans = KMeans(n_clusters=k, init='k-means++', random_state=42) kmeans.fit(X_scaled) inertia.append(kmeans.inertia_) plt.figure(figsize=(8, 5)) plt.plot(K, inertia, marker='o') plt.xlabel('Number of Clusters') plt.ylabel('Inertia') plt.title('Elbow Method to Determine Optimal Clusters') plt.show()
Explanation:
- The elbow method helps determine the optimal number of clusters by plotting inertia (sum of squared distances from each point to its cluster center).
Training K-Means Model
kmeans = KMeans(n_clusters=5, init='k-means++', random_state=42) kmeans.fit(X_scaled) df['Cluster'] = kmeans.labels_
Explanation:
- We fit the K-Means model with the optimal number of clusters.
- Assign cluster labels to customers.
5. Visualizing Customer Segments
Scatter Plot of Customer Segments
plt.figure(figsize=(10, 6)) sns.scatterplot(x=X_scaled[:, 0], y=X_scaled[:, 1], hue=df['Cluster'], palette='viridis', s=100) plt.xlabel('Annual Income') plt.ylabel('Spending Score') plt.title('Customer Segmentation using K-Means') plt.show()
Explanation:
- Visualizes the formed customer segments based on their income and spending behavior.
6. Conclusion
This Customer Segmentation Using K-Means in Python tutorial demonstrated:
- Data exploration and preprocessing.
- Using the Elbow Method to find the optimal number of clusters.
- Training the K-Means algorithm for segmentation.
- Visualizing customer segments.
Next Steps:
- Try adding more features like Age.
- Experiment with hierarchical clustering or DBSCAN.
- Use PCA for dimensionality reduction before clustering.
Would love to hear your insights! Share your experiences or suggestions in the comments below.
Customer segmentation using K-Means in Python, Unsupervised learning, K-Means clustering, Machine learning for business.
Click Here To Explore More Projects From us