Customer Segmentation Using K-Means in Python -

Author
Recent Posts

Data Scientist at LeadTech Group

Passionate about unlocking insights from data, I am a dedicated data scientist with a keen interest in AI and Machine Learning. As a tech enthusiast, I constantly explore new technologies and innovations. My journey is driven by a love for learning and a commitment to leveraging data to create meaningful impact.

Latest posts by KANGKAN KALITA (see all)

Predicting House Prices using Machine Learning - April 10, 2025
10 Data Visualization Project Ideas with Source Code - April 9, 2025
Music Recommendation System using Python – Full Project - April 7, 2025

Customer Segmentation Using K-Means:

Customer segmentation is a key application of machine learning in marketing and business analytics. It allows companies to group customers based on shared characteristics, improving targeting strategies, personalization, and customer satisfaction.

In this tutorial, we will use K-Means clustering, an unsupervised machine learning algorithm, to segment customers based on their purchasing behavior.

Objectives:

Understand customer segmentation and its importance.
Preprocess and analyze customer data.
Implement K-Means clustering to create customer segments.
Visualize and interpret the results.

Dataset Description

For this project, we will use the Mall Customers Dataset. It contains customer details such as age, spending score, and income, which will help us cluster customers based on shopping patterns.

Columns:

CustomerID: Unique customer identifier.
Gender: Male/Female.
Age: Age of the customer.
Annual Income (k$): Yearly income of the customer in thousands.
Spending Score (1-100): Customer spending behavior score assigned by the mall.

Tools & Libraries

We will use the following Python libraries:

pandas, numpy (Data manipulation)
matplotlib, seaborn (Visualization)
scikit-learn (Machine learning implementation)

1. Data Collection & Setup

Importing Libraries and Loading Dataset

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# Load dataset
df = pd.read_csv("Mall_Customers.csv")

# Preview dataset
df.head()

Explanation:

We import the required libraries.
Load the dataset using pandas.read_csv().
Display the first five rows using head().

2. Exploratory Data Analysis (EDA)

Checking for Missing Values

print(df.isnull().sum())

Descriptive Statistics

print(df.describe())

Visualizing Customer Distribution

sns.pairplot(df[['Age', 'Annual Income (k$)', 'Spending Score (1-100)']])
plt.show()

Explanation:

We check for missing values.
Generate summary statistics using describe().
Use pair plots to visualize relationships between variables.

3. Data Preprocessing

Selecting Relevant Features

X = df[['Annual Income (k$)', 'Spending Score (1-100)']]

Feature Scaling

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

Explanation:

We select the Annual Income and Spending Score columns for clustering.
Standardize the data using StandardScaler() to normalize different scales.

4. Applying K-Means Clustering

Finding Optimal Number of Clusters Using Elbow Method

inertia = []
K = range(1, 11)
for k in K:
    kmeans = KMeans(n_clusters=k, init='k-means++', random_state=42)
    kmeans.fit(X_scaled)
    inertia.append(kmeans.inertia_)

plt.figure(figsize=(8, 5))
plt.plot(K, inertia, marker='o')
plt.xlabel('Number of Clusters')
plt.ylabel('Inertia')
plt.title('Elbow Method to Determine Optimal Clusters')
plt.show()

Explanation:

The elbow method helps determine the optimal number of clusters by plotting inertia (sum of squared distances from each point to its cluster center).

Training K-Means Model

kmeans = KMeans(n_clusters=5, init='k-means++', random_state=42)
kmeans.fit(X_scaled)
df['Cluster'] = kmeans.labels_

Explanation:

We fit the K-Means model with the optimal number of clusters.
Assign cluster labels to customers.

5. Visualizing Customer Segments

Scatter Plot of Customer Segments

plt.figure(figsize=(10, 6))
sns.scatterplot(x=X_scaled[:, 0], y=X_scaled[:, 1], hue=df['Cluster'], palette='viridis', s=100)
plt.xlabel('Annual Income')
plt.ylabel('Spending Score')
plt.title('Customer Segmentation using K-Means')
plt.show()

Explanation:

Visualizes the formed customer segments based on their income and spending behavior.

6. Conclusion

This Customer Segmentation Using K-Means in Python tutorial demonstrated:

Data exploration and preprocessing.
Using the Elbow Method to find the optimal number of clusters.
Training the K-Means algorithm for segmentation.
Visualizing customer segments.

Next Steps:

Try adding more features like Age.
Experiment with hierarchical clustering or DBSCAN.
Use PCA for dimensionality reduction before clustering.

Would love to hear your insights! Share your experiences or suggestions in the comments below.

Customer segmentation using K-Means in Python, Unsupervised learning, K-Means clustering, Machine learning for business.

Click Here To Explore More Projects From us

Latest Posts:

Post Views: 63

Customer Segmentation Using K-Means in Python

Objectives:

Dataset Description

Columns:

Tools & Libraries

1. Data Collection & Setup

Importing Libraries and Loading Dataset

2. Exploratory Data Analysis (EDA)

Checking for Missing Values

Descriptive Statistics

Visualizing Customer Distribution

3. Data Preprocessing

Selecting Relevant Features

Feature Scaling

4. Applying K-Means Clustering

Finding Optimal Number of Clusters Using Elbow Method

Training K-Means Model

5. Visualizing Customer Segments

Scatter Plot of Customer Segments

6. Conclusion

Latest Posts:

Data Analyst Project Ideas for Resume: Best Projects to Showcase Your Skills

Top Books on Data Analytics to Elevate Your Skills

Top 10 Free Dataset Sources for Data Science Projects

Data Visualization Techniques: A Comprehensive Guide

Top Free Data Science Courses for 2025

AI Agent Project Ideas: Innovative Concepts to Build Smart Applications

Leave a Reply Cancel reply

Objectives:

Dataset Description

Columns:

Tools & Libraries

1. Data Collection & Setup

Importing Libraries and Loading Dataset

2. Exploratory Data Analysis (EDA)

Checking for Missing Values

Descriptive Statistics

Visualizing Customer Distribution

3. Data Preprocessing

Selecting Relevant Features

Feature Scaling

4. Applying K-Means Clustering

Finding Optimal Number of Clusters Using Elbow Method

Training K-Means Model

5. Visualizing Customer Segments

Scatter Plot of Customer Segments

6. Conclusion

Latest Posts:

Similar Posts

Leave a Reply Cancel reply