Movie Recommendation System Project with Source Code
- Top 10 Data Analysis Techniques for Beginners [2025 Guide to Get Started Fast] - May 30, 2025
- How to Build a Powerful Data Scientist Portfolio as a Beginner [Step-by-Step 2025 Guide] - May 26, 2025
- Hypothesis Testing in Machine Learning Using Python: A Complete Beginner’s Guide [2025] - May 24, 2025

Movie Recommendation System Project with Source Code:
Introduction:
In today’s world, people prefer watching movies online rather than on traditional TV. A movie recommendation system project with source code is an exciting and innovative project that helps users discover movies based on their preferences. This blog will guide you through building an end-to-end movie recommender system using the MovieLens dataset.
Objective
- Understand recommendation system concepts.
- Preprocess and analyze the MovieLens dataset.
- Implement different recommendation techniques.
- Build an efficient movie recommender system.
Tools & Libraries
We will use the following Python libraries:
pandas
for data manipulation.numpy
for numerical computations.matplotlib
andseaborn
for visualization.scikit-learn
for machine learning models.surprise
for recommendation algorithms.
1. Importing Libraries and Loading Data
First, install the required libraries if you haven’t already:
!pip install pandas numpy matplotlib seaborn scikit-learn surprise
Now, import the necessary libraries and load the MovieLens dataset:
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from surprise import Dataset, Reader, SVD from surprise.model_selection import train_test_split from surprise.accuracy import rmse # Load dataset movies = pd.read_csv('movies.csv') ratings = pd.read_csv('ratings.csv') print(movies.head()) print(ratings.head())
Explanation:
- The MovieLens dataset consists of
movies.csv
(movie details) andratings.csv
(user ratings). - The dataset includes
movieId
,title
,genres
,userId
,rating
, andtimestamp
.
2. Exploratory Data Analysis (EDA)
Understanding the dataset structure:
print(movies.info()) print(ratings.describe()) print(ratings.isnull().sum())
Visualization of Ratings Distribution
plt.figure(figsize=(10,5)) sns.histplot(ratings['rating'], bins=10, kde=True) plt.title('Distribution of Ratings') plt.xlabel('Rating') plt.ylabel('Count') plt.show()
Explanation:
- This helps understand the distribution of ratings.
- Identifies potential data imbalances.
3. Data Preprocessing
Handling Missing Values
movies.dropna(inplace=True) ratings.dropna(inplace=True)
Encoding and Preparing Data
reader = Reader(rating_scale=(0.5, 5.0)) data = Dataset.load_from_df(ratings[['userId', 'movieId', 'rating']], reader)
Explanation:
- We drop missing values to ensure clean data.
Reader
defines the rating scale for training the model.
4. Building a Collaborative Filtering Model
Splitting Data
trainset, testset = train_test_split(data, test_size=0.2)
Implementing Singular Value Decomposition (SVD)
model = SVD() model.fit(trainset) # Predictions y_pred = model.test(testset)
Evaluating Model
rmse_score = rmse(y_pred) print(f'RMSE: {rmse_score}')
Explanation:
- SVD (Singular Value Decomposition) is used for collaborative filtering.
- The dataset is split into training (80%) and testing (20%).
- Root Mean Squared Error (RMSE) evaluates prediction accuracy.
5. Making Recommendations
Function to Get Movie Recommendations
def recommend_movies(user_id, model, movies, ratings, num_recommendations=5): unique_movies = movies[~movies['movieId'].isin(ratings[ratings['userId'] == user_id]['movieId'])] unique_movies['predicted_rating'] = unique_movies['movieId'].apply(lambda x: model.predict(user_id, x).est) recommendations = unique_movies.sort_values(by='predicted_rating', ascending=False).head(num_recommendations) return recommendations[['title', 'predicted_rating']] # Get recommendations for user ID 1 recommendations = recommend_movies(1, model, movies, ratings) print(recommendations)
Explanation:
- This function suggests top N movies a user might like.
- It filters out movies the user has already rated.
- Uses predicted ratings to recommend the highest-rated movies.
6. Content-Based Filtering (Optional)
Another approach is to recommend movies based on their content (genres, descriptions).
from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.metrics.pairwise import cosine_similarity # TF-IDF Vectorizer for genres vectorizer = TfidfVectorizer(stop_words='english') tfidf_matrix = vectorizer.fit_transform(movies['genres'].fillna('')) # Compute similarity cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)
Explanation:
- TF-IDF Vectorizer converts movie genres into numerical features.
- Cosine similarity finds similar movies.
Conclusion
- We implemented a movie recommendation system project with source code.
- Used collaborative filtering (SVD) and content-based filtering.
- The system recommends personalized movies to users based on their past ratings.
For further improvements, you can try hybrid models combining both approaches.
Do you want more advanced recommendation techniques? Let me know in the comments!
Explore more from us: Click Here
Latest Posts:
- Top 10 Data Analysis Techniques for Beginners [2025 Guide to Get Started Fast]
- How to Build a Powerful Data Scientist Portfolio as a Beginner [Step-by-Step 2025 Guide]
- Hypothesis Testing in Machine Learning Using Python: A Complete Beginner’s Guide [2025]
- Netflix Data Analysis with Python: Beginner-Friendly Project with Code & Insights
- 15 Best Machine Learning Projects for Your Resume That Will Impress Recruiters [2025 Guide]