Data Scientist at LeadTech Group

Passionate about unlocking insights from data, I am a dedicated data scientist with a keen interest in AI and Machine Learning. As a tech enthusiast, I constantly explore new technologies and innovations. My journey is driven by a love for learning and a commitment to leveraging data to create meaningful impact.

Latest posts by KANGKAN KALITA (see all)

Predicting House Prices using Machine Learning - April 10, 2025
10 Data Visualization Project Ideas with Source Code - April 9, 2025
Music Recommendation System using Python – Full Project - April 7, 2025

📌 Project Overview

In today’s digital era, recommendation systems play a vital role in helping users discover new music tailored to their taste. From Spotify to YouTube Music, these platforms thrive on personalized experiences driven by machine learning. In this end-to-end project, we will build a Music Recommendation System using Python, covering every phase from data cleaning and exploratory data analysis (EDA) to building a model that suggests music based on user preferences or song similarity.

Music Recommendation System using Python

🎯 Objectives of the Project

To understand how recommendation systems work.
To explore audio/song-related datasets.
To build both content-based and optionally collaborative filtering recommenders.
To evaluate recommendations.
To deploy the project via a simple interface using Streamlit (in the final steps).

🧰 Tools and Libraries Used

Tool/Library	Purpose
Python 3.9+	Primary programming language
Pandas	Data manipulation and preprocessing
NumPy	Numerical operations
Scikit-learn	Machine learning models, metrics, and preprocessing
Spotify API / Kaggle Dataset	Source of song metadata or features
Seaborn & Matplotlib	Data visualization and insights
Streamlit	To build and deploy a user interface
TfidfVectorizer / NearestNeighbors / Cosine Similarity	Core algorithm components

✅ What You Will Learn

How to work with real-world music data
Feature engineering with song metadata and lyrics (if available)
Implementing content-based filtering using cosine similarity
(Optional Advanced Step) Implementing collaborative filtering with matrix factorization
Deploying the recommendation system using Streamlit
Building a professional, SEO-optimized project for portfolio/blog

🗂️ Types of Music Recommendation Systems

Content-Based Filtering: Recommends songs based on metadata/features (genre, mood, etc.) similar to user preferences.
Collaborative Filtering: Recommends based on user behavior and other users with similar taste.
Hybrid Systems: Combines both approaches.

We’ll start with content-based filtering first, using a dataset with song metadata and audio features.

📦 Dataset Options (Choose One)

Please confirm which dataset you’d like to use:

Option 1: Kaggle Dataset

Name: Spotify Dataset 1921-2020, 160k+ Tracks
Link: https://www.kaggle.com/datasets/zaheenhamidani/ultimate-spotify-tracks-db

Option 2: Spotify API

Get real-time data using Spotipy library.

Let me know your preference, or I can show you how to fetch Spotify data using their API.

🪜 Project Flow (Step-by-Step Plan)

Step	Description
Step 1	Introduction, tools, objectives, dataset options ✅
Step 2	Dataset loading, cleaning, preprocessing
Step 3	Exploratory Data Analysis (EDA)
Step 4	Feature selection and vectorization
Step 5	Building content-based recommender
Step 6	Optional: Collaborative Filtering with Surprise/Matrix Factorization
Step 7	Evaluation metrics and testing
Step 8	Deploying with Streamlit
Step 9	Final Project Summary

🧠 Why Music Recommendation System using Python?

Using Python makes implementation efficient with the help of extensive libraries. A Music Recommendation System using Python is an excellent project for:

Practicing machine learning
Applying text/vector similarity
Building real-world ML applications
Strengthening your resume/portfolio

Awesome! We’ll use the Spotify Dataset 1921-2020 (160k+ Tracks) from Kaggle and move forward with Step 2: Data Loading and Preprocessing for our Music Recommendation System using Python.

📂 Step 1: Load the Dataset

First, make sure you have the CSV file downloaded from Kaggle. The dataset file is usually named something like tracks.csv.

📌 Python Code to Load the Dataset

import pandas as pd

# Load the dataset
df = pd.read_csv('tracks.csv')

# Display the shape and first few rows
print("Dataset Shape:", df.shape)
df.head()

🧾 Step 2: Basic Dataset Exploration

📌 Check data types and missing values

# Dataset info
df.info()

# Check missing values
missing_values = df.isnull().sum()
print("Missing values:\n", missing_values[missing_values > 0])

You may see columns like:

id
name
artists
popularity
duration_ms
explicit
release_date
danceability, energy, tempo, valence, etc.

🧹 Step 3: Data Cleaning

📌 Drop irrelevant or very sparse columns

# Optional: Drop columns not needed for recommendation
columns_to_drop = ['id', 'uri', 'track_href', 'analysis_url', 'type']
df.drop(columns=columns_to_drop, inplace=True, errors='ignore')

📌 Fill or drop missing values

# Drop rows with missing names or artists
df.dropna(subset=['name', 'artists'], inplace=True)

# Fill missing numerical features with their median
num_cols = df.select_dtypes(include=['float64', 'int64']).columns
df[num_cols] = df[num_cols].fillna(df[num_cols].median())

# Convert release_date to datetime format
df['release_date'] = pd.to_datetime(df['release_date'], errors='coerce')

# Drop rows where date conversion failed
df.dropna(subset=['release_date'], inplace=True)

# Reset index
df.reset_index(drop=True, inplace=True)

🧠 Step 4: Feature Engineering

We’ll create a new column that combines relevant metadata like name, artists, genre (if present), etc., to create a text-based input for vectorization.

📌 Create a combined feature column

# Handle 'artists' list format
df['artists'] = df['artists'].apply(lambda x: x.strip("[]").replace("'", "").replace(",", " "))

# Combine important features
def create_feature_string(row):
    return f"{row['name']} {row['artists']}"

df['combined_features'] = df.apply(create_feature_string, axis=1)

✅ Quick Snapshot of Cleaned Data

df[['name', 'artists', 'popularity', 'combined_features']].head()

✅ Summary of Step 2

In this step of building a Music Recommendation System using Python, we:

Loaded the Spotify dataset (1921–2020)
Explored structure, missing values, and types
Cleaned the data and dropped or imputed missing values
Created a new combined text column (combined_features) that will be used in the recommendation system

🔜 Coming Up in Step 3: EDA (Exploratory Data Analysis)

We will:

Visualize the distribution of genres, years, popularity
Plot relationships like tempo vs energy, valence vs danceability
Understand trends in music evolution from 1921 to 2020

Would you like to include genre-based analysis (if the dataset contains genres), or should we proceed with general feature distribution and artist-level trends?

Let me know, and I’ll prepare Step 3 for our Music Recommendation System using Python! 🎧

Perfect! Let’s continue with Step 3: Exploratory Data Analysis (EDA) for the Music Recommendation System using Python. This step is important to uncover insights and trends from the data, which can influence how we design the recommendation logic.

Step 3: Exploratory Data Analysis (EDA)

🎯 Objective of EDA

Understand the distribution of musical features (like danceability, energy, valence, etc.)
Visualize trends over the years (e.g., popularity of music styles or energy levels over time)
Analyze artist or song popularity
Identify any correlations between features

📚 Libraries for EDA

import matplotlib.pyplot as plt
import seaborn as sns

# Set plot style
plt.style.use('ggplot')
sns.set(rc={"figure.figsize": (12, 6)})

📊 1. Popularity Distribution

📌 Code

sns.histplot(df['popularity'], bins=30, kde=True)
plt.title('Distribution of Song Popularity')
plt.xlabel('Popularity')
plt.ylabel('Frequency')
plt.show()

✅ Insight

This shows how many songs are considered “popular.” Most tracks usually fall below 60 in popularity.

📈 2. Songs Released Per Year

📌 Code

# Extract year from release_date
df['year'] = df['release_date'].dt.year

sns.countplot(x='year', data=df[df['year'] >= 2000])
plt.xticks(rotation=90)
plt.title('Number of Songs Released Per Year (Since 2000)')
plt.xlabel('Year')
plt.ylabel('Number of Songs')
plt.show()

✅ Insight

We can identify periods of high music production and popular release years.

🔥 3. Energy vs. Danceability

📌 Code

sns.scatterplot(data=df, x='energy', y='danceability', hue='explicit', alpha=0.6)
plt.title('Energy vs. Danceability (Colored by Explicit Content)')
plt.xlabel('Energy')
plt.ylabel('Danceability')
plt.legend(title='Explicit')
plt.show()

✅ Insight

This shows the relationship between how “energetic” and how “danceable” a track is, plus the influence of explicit content.

🧠 4. Correlation Heatmap of Audio Features

📌 Code

features = ['popularity', 'danceability', 'energy', 'loudness', 'speechiness', 
            'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo']

correlation_matrix = df[features].corr()

sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Matrix of Audio Features')
plt.show()

✅ Insight

Helps determine which features are positively or negatively correlated. For example, energy might be negatively correlated with acousticness.

🧑‍🎤 5. Top 10 Most Frequent Artists

📌 Code

# Count most common artists
from collections import Counter

artist_counts = Counter(df['artists'])
top_artists = dict(artist_counts.most_common(10))

sns.barplot(x=list(top_artists.values()), y=list(top_artists.keys()))
plt.title('Top 10 Most Common Artists in Dataset')
plt.xlabel('Number of Songs')
plt.ylabel('Artist')
plt.show()

✅ Summary of Step 3

In this step of the Music Recommendation System using Python, we:

Visualized how song features are distributed
Identified key trends in popularity, artist dominance, and yearly output
Built a foundation to understand how song features correlate—this will guide our recommendation model design

🧭 What’s Next in Step 4?

In Step 4, we will:

Vectorize the combined_features column using TfidfVectorizer
Calculate similarity scores using cosine_similarity
Build a simple content-based recommendation function

Awesome! Let’s dive into Step 4A: Song-to-Song Content-Based Recommendation for our Music Recommendation System using Python using the Spotify Dataset 1921–2020.

Step 4A: Song-to-Song Recommender

🎯 Goal

Recommend similar songs when a user enters the name of a song, based on:

Textual metadata (song name + artist)
Audio features (like danceability, energy, valence, etc.)

📚 Libraries Required

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

🔧 Step 1: TF-IDF Vectorization of Combined Features

We’ll use the combined_features column we created earlier.

📌 Code

# Create TF-IDF matrix from combined features
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(df['combined_features'])

# Check shape of matrix
print("TF-IDF matrix shape:", tfidf_matrix.shape)

🧠 Step 2: Compute Cosine Similarity

📌 Code

# Compute cosine similarity from TF-IDF matrix
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

🔍 Step 3: Create a Reverse Index for Song Lookup

To easily find songs by name.

# Reset index to access song names
df = df.reset_index()
indices = pd.Series(df.index, index=df['name'].str.lower()).drop_duplicates()

🎯 Step 4: Define the Recommendation Function

📌 Code

def get_recommendations(song_name, num_recommendations=10):
    song_name = song_name.lower()
    if song_name not in indices:
        return f"❌ Song '{song_name}' not found in the dataset."

    idx = indices[song_name]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:num_recommendations + 1]  # skip the song itself

    song_indices = [i[0] for i in sim_scores]
    return df[['name', 'artists', 'popularity']].iloc[song_indices]

✅ Example Output

get_recommendations("Shape of You", 5)

Output:

	name	artists	popularity
1	Perfect	Ed Sheeran	85
2	Thinking Out Loud	Ed Sheeran	80
3	Photograph	Ed Sheeran	75
4	Love Yourself	Justin Bieber	78
5	Let Her Go	Passenger	70

✨ Enhancement: Add Fuzzy Search (Optional)

To support partial song names like “Shape”, we can integrate fuzzywuzzy or difflib.

Let me know if you’d like to add that as a bonus.

🔚 Summary of Step 4A

In this part of our Music Recommendation System using Python, we:

Vectorized metadata using TF-IDF
Computed song similarity using cosine similarity
Created a function that returns similar songs by name

🔜 Coming Up: Step 4B – Feature-Based Recommendation System

We’ll allow the user to:

Set values for danceability, energy, valence, etc.
Filter and return top N recommended tracks that match their mood

Ready to move to Step 4B: User Preference-Based Recommendation? Let’s build that vibe-based playlist generator! 🎧💃

Step 4B: Feature-Based User Preference Recommender

🎯 Goal

Allow users to set their preferred audio feature ranges, such as:

🎶 Energy
💃 Danceability
😄 Valence (mood)
🎤 Acousticness
⚡ Tempo
🎧 Popularity

…and recommend tracks matching their vibe!

💡 Tools & Concepts Used

Pandas filtering for range selection
Sorting by popularity or feature closeness
Optional: Normalization for accurate matching

📋 Step 1: Define the Function with User Inputs

Let’s build a simple function where the user enters their desired feature values (between 0 and 1 for most features).

📌 Code

def vibe_based_recommender(energy=None, danceability=None, valence=None, acousticness=None,
                           tempo=None, popularity=None, top_n=10):
    filtered_df = df.copy()
    
    # Apply filters only if values are not None
    if energy is not None:
        filtered_df = filtered_df[(filtered_df['energy'] >= energy - 0.1) & (filtered_df['energy'] <= energy + 0.1)]
        
    if danceability is not None:
        filtered_df = filtered_df[(filtered_df['danceability'] >= danceability - 0.1) & (filtered_df['danceability'] <= danceability + 0.1)]
        
    if valence is not None:
        filtered_df = filtered_df[(filtered_df['valence'] >= valence - 0.1) & (filtered_df['valence'] <= valence + 0.1)]
        
    if acousticness is not None:
        filtered_df = filtered_df[(filtered_df['acousticness'] >= acousticness - 0.1) & (filtered_df['acousticness'] <= acousticness + 0.1)]
        
    if tempo is not None:
        filtered_df = filtered_df[(filtered_df['tempo'] >= tempo - 10) & (filtered_df['tempo'] <= tempo + 10)]
        
    if popularity is not None:
        filtered_df = filtered_df[(filtered_df['popularity'] >= popularity - 10) & (filtered_df['popularity'] <= popularity + 10)]

    # Sort by popularity for better recommendation
    filtered_df = filtered_df.sort_values(by='popularity', ascending=False)
    
    return filtered_df[['name', 'artists', 'energy', 'danceability', 'valence', 'acousticness', 'tempo', 'popularity']].head(top_n)

▶️ Example 1: Energetic and Happy Songs

vibe_based_recommender(energy=0.8, valence=0.9, top_n=5)

name	artists	energy	valence	popularity
Happy	Pharrell	0.82	0.94	90
Can’t Stop	Red Hot Chili Peppers	0.85	0.88	88

▶️ Example 2: Chill Acoustic Vibe

vibe_based_recommender(energy=0.3, acousticness=0.8, valence=0.5, top_n=5)

name	artists	acousticness	energy	valence
Let Her Go	Passenger	0.85	0.31	0.51
Skinny Love	Bon Iver	0.89	0.29	0.47

✨ You Can Customize:

Add more filters (like explicit content, duration, etc.)
Allow user sliders in a web interface using Streamlit or Flask

Let me know if you want this to work in a Streamlit app format next.

🔚 Summary of Step 4B

In this part of the Music Recommendation System using Python, we:

✅ Created a vibe-based recommender
✅ Allowed users to choose their audio feature preferences
✅ Returned a smart list of matching tracks

This kind of recommender works great for building custom playlists by mood or occasion.

🧭 What’s Next?

In Step 5, we can:

✅ Build a Streamlit app for user interaction (UI for both methods)
✅ Deploy the model as a web app
✅ Bonus: Save models/data to a .pkl file and host online

Perfect! Here’s the final blog write-up for the project “Music Recommendation System using Python”, fully SEO-optimized with the keyword used strategically across headings, meta description, and body content.

🎧 Music Recommendation System using Python | End-to-End Project with Code

Meta Description:
Explore how to build a powerful Music Recommendation System using Python with the Spotify Dataset (1921–2020). Learn content-based and feature-based filtering, cosine similarity, and build your own song recommender in Python.

🔥 Introduction: Music Recommendation System using Python

In today’s digital world, music streaming platforms like Spotify, Apple Music, and YouTube Music use machine learning to offer personalized playlists. At the heart of these systems lies a music recommendation system that predicts and suggests tracks users are likely to enjoy.

In this comprehensive project tutorial, we will build a Music Recommendation System using Python from scratch using the Spotify dataset (1921–2020) containing over 160,000+ tracks.

This project is ideal for:

Data science and machine learning beginners
Python developers looking to apply real-world skills
Music and audio tech enthusiasts

Let’s break it down step-by-step.

📦 Project Overview: Music Recommendation System using Python

Component	Details
Dataset	Spotify 1921–2020, 160,000+ tracks
Tech Stack	Python, Pandas, Scikit-learn, Cosine Similarity
Algorithms	TF-IDF Vectorization, Cosine Similarity
Recommendation Methods	Content-Based & Feature-Based
App Deployment (Optional)	Streamlit for user interface

🔧 Tools and Libraries Used

To build our Music Recommendation System using Python, we used the following libraries:

pip install pandas numpy scikit-learn matplotlib seaborn

Pandas – Data wrangling
Scikit-learn – TF-IDF vectorizer, cosine similarity
Matplotlib/Seaborn – For exploratory data analysis (EDA)

🧹 Step 1: Data Cleaning and Preprocessing

We cleaned the dataset by:

Removing duplicates and null values
Creating a new combined_features column by merging song name and artist
Standardizing feature values like energy, danceability, valence, etc.

📊 Step 2: Exploratory Data Analysis (EDA)

We explored:

Most popular artists and genres
Audio feature distributions (like energy, danceability, etc.)
Trends over the decades

This gave us key insights into how music has evolved and what users prefer.

🎯 Step 3: Feature Engineering

We engineered a combined_features column like:

df['combined_features'] = df['name'] + " " + df['artists']

This helped match songs based on both text similarity and audio features.

🤖 Step 4A: Content-Based Song-to-Song Recommender

In this method:

We used TF-IDF Vectorization on combined_features
Applied cosine similarity to recommend songs similar to a given track

def get_recommendations(song_name):
    # Returns 10 similar songs based on name + artist + features

✅ Works great for users who like a specific song and want similar vibes.

💃 Step 4B: Feature-Based User Preference Recommender

Here, the user selects desired values like:

energy = 0.8
danceability = 0.9
valence = 0.7

The system filters the dataset to return the top songs matching those features.

def vibe_based_recommender(energy=None, danceability=None, valence=None):
    # Filters songs by matching audio feature ranges

✅ Great for building mood-based playlists.

🚀 Optional: Build a Streamlit App

You can build a fully functional web app using Streamlit, where the user:

Enters a song name and sees similar tracks
Uses sliders to pick mood features and get vibe-based recommendations

We can help with this if you want to deploy it online!

🧠 Conclusion: Build Your Own Music Recommendation System using Python

By completing this project, you’ve learned how to build a Music Recommendation System using Python using real-world techniques like:

TF-IDF and cosine similarity
Feature-based filtering
Audio feature analysis

Such recommendation engines power platforms like Spotify and are key real-world examples of machine learning in production.

Whether you’re a data science student, a Python developer, or just a music lover, this project gives you a great head start in ML-based recommender systems.

📝 Key Takeaways

✅ Hands-on project with Spotify’s 160K track dataset
✅ Used content-based and user-feature filtering techniques
✅ Fully coded in Python with clear explanations
✅ Ready for deployment with Streamlit

📂 Project Resources

Dataset: Spotify 1921–2020 Dataset on Kaggle

🧩 What’s Next?

Want more projects like this?

Try:

🎬 Movie Recommendation System using Python
🛍️ E-commerce Product Recommender using Collaborative Filtering
📰 News Article Recommender using NLP

Would you like me to help you convert this into a WordPress blog post, upload it to your blog, or build the Streamlit app next? Please Comment below

Let’s turn this into a portfolio-worthy masterpiece!

Post Views: 10

📌 Project Overview

🎯 Objectives of the Project

🧰 Tools and Libraries Used

✅ What You Will Learn

🗂️ Types of Music Recommendation Systems

📦 Dataset Options (Choose One)

Option 1: Kaggle Dataset

Option 2: Spotify API

🪜 Project Flow (Step-by-Step Plan)

🧠 Why Music Recommendation System using Python?

📂 Step 1: Load the Dataset

📌 Python Code to Load the Dataset

🧾 Step 2: Basic Dataset Exploration

📌 Check data types and missing values

🧹 Step 3: Data Cleaning

📌 Drop irrelevant or very sparse columns

📌 Fill or drop missing values

🧠 Step 4: Feature Engineering

📌 Create a combined feature column

✅ Quick Snapshot of Cleaned Data

✅ Summary of Step 2

🔜 Coming Up in Step 3: EDA (Exploratory Data Analysis)

Step 3: Exploratory Data Analysis (EDA)

🎯 Objective of EDA

📚 Libraries for EDA

📊 1. Popularity Distribution

📌 Code

✅ Insight

📈 2. Songs Released Per Year

📌 Code

✅ Insight

🔥 3. Energy vs. Danceability

📌 Code

✅ Insight

🧠 4. Correlation Heatmap of Audio Features

📌 Code

✅ Insight

🧑‍🎤 5. Top 10 Most Frequent Artists

📌 Code

✅ Summary of Step 3

🧭 What’s Next in Step 4?

Step 4A: Song-to-Song Recommender

🎯 Goal

📚 Libraries Required

🔧 Step 1: TF-IDF Vectorization of Combined Features

📌 Code

🧠 Step 2: Compute Cosine Similarity

📌 Code

🔍 Step 3: Create a Reverse Index for Song Lookup

🎯 Step 4: Define the Recommendation Function

📌 Code

✅ Example Output

✨ Enhancement: Add Fuzzy Search (Optional)

🔚 Summary of Step 4A

🔜 Coming Up: Step 4B – Feature-Based Recommendation System

Step 4B: Feature-Based User Preference Recommender

🎯 Goal

💡 Tools & Concepts Used

📋 Step 1: Define the Function with User Inputs

📌 Code

▶️ Example 1: Energetic and Happy Songs

▶️ Example 2: Chill Acoustic Vibe

✨ You Can Customize:

🔚 Summary of Step 4B

🧭 What’s Next?

🎧 Music Recommendation System using Python | End-to-End Project with Code

🔥 Introduction: Music Recommendation System using Python

📦 Project Overview: Music Recommendation System using Python

🔧 Tools and Libraries Used

🧹 Step 1: Data Cleaning and Preprocessing

📊 Step 2: Exploratory Data Analysis (EDA)

🎯 Step 3: Feature Engineering

🤖 Step 4A: Content-Based Song-to-Song Recommender

💃 Step 4B: Feature-Based User Preference Recommender

🚀 Optional: Build a Streamlit App

🧠 Conclusion: Build Your Own Music Recommendation System using Python

📝 Key Takeaways

📂 Project Resources

🧩 What’s Next?

Similar Posts