Movie Recommendation System Project with Source Code

KANGKAN KALITA
Movie Recommendation System Project with Source Code

Movie Recommendation System Project with Source Code:

Introduction:

In today’s world, people prefer watching movies online rather than on traditional TV. A movie recommendation system project with source code is an exciting and innovative project that helps users discover movies based on their preferences. This blog will guide you through building an end-to-end movie recommender system using the MovieLens dataset.

Objective

  • Understand recommendation system concepts.
  • Preprocess and analyze the MovieLens dataset.
  • Implement different recommendation techniques.
  • Build an efficient movie recommender system.

Tools & Libraries

We will use the following Python libraries:

  • pandas for data manipulation.
  • numpy for numerical computations.
  • matplotlib and seaborn for visualization.
  • scikit-learn for machine learning models.
  • surprise for recommendation algorithms.

1. Importing Libraries and Loading Data

First, install the required libraries if you haven’t already:

!pip install pandas numpy matplotlib seaborn scikit-learn surprise

Now, import the necessary libraries and load the MovieLens dataset:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from surprise import Dataset, Reader, SVD
from surprise.model_selection import train_test_split
from surprise.accuracy import rmse

# Load dataset
movies = pd.read_csv('movies.csv')
ratings = pd.read_csv('ratings.csv')
print(movies.head())
print(ratings.head())

Explanation:

  • The MovieLens dataset consists of movies.csv (movie details) and ratings.csv (user ratings).
  • The dataset includes movieId, title, genres, userId, rating, and timestamp.

2. Exploratory Data Analysis (EDA)

Understanding the dataset structure:

print(movies.info())
print(ratings.describe())
print(ratings.isnull().sum())

Visualization of Ratings Distribution

plt.figure(figsize=(10,5))
sns.histplot(ratings['rating'], bins=10, kde=True)
plt.title('Distribution of Ratings')
plt.xlabel('Rating')
plt.ylabel('Count')
plt.show()

Explanation:

  • This helps understand the distribution of ratings.
  • Identifies potential data imbalances.

3. Data Preprocessing

Handling Missing Values

movies.dropna(inplace=True)
ratings.dropna(inplace=True)

Encoding and Preparing Data

reader = Reader(rating_scale=(0.5, 5.0))
data = Dataset.load_from_df(ratings[['userId', 'movieId', 'rating']], reader)

Explanation:

  • We drop missing values to ensure clean data.
  • Reader defines the rating scale for training the model.

4. Building a Collaborative Filtering Model

Splitting Data

trainset, testset = train_test_split(data, test_size=0.2)

Implementing Singular Value Decomposition (SVD)

model = SVD()
model.fit(trainset)

# Predictions
y_pred = model.test(testset)

Evaluating Model

rmse_score = rmse(y_pred)
print(f'RMSE: {rmse_score}')

Explanation:

  • SVD (Singular Value Decomposition) is used for collaborative filtering.
  • The dataset is split into training (80%) and testing (20%).
  • Root Mean Squared Error (RMSE) evaluates prediction accuracy.

5. Making Recommendations

Function to Get Movie Recommendations

def recommend_movies(user_id, model, movies, ratings, num_recommendations=5):
    unique_movies = movies[~movies['movieId'].isin(ratings[ratings['userId'] == user_id]['movieId'])]
    unique_movies['predicted_rating'] = unique_movies['movieId'].apply(lambda x: model.predict(user_id, x).est)
    recommendations = unique_movies.sort_values(by='predicted_rating', ascending=False).head(num_recommendations)
    return recommendations[['title', 'predicted_rating']]

# Get recommendations for user ID 1
recommendations = recommend_movies(1, model, movies, ratings)
print(recommendations)

Explanation:

  • This function suggests top N movies a user might like.
  • It filters out movies the user has already rated.
  • Uses predicted ratings to recommend the highest-rated movies.

6. Content-Based Filtering (Optional)

Another approach is to recommend movies based on their content (genres, descriptions).

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# TF-IDF Vectorizer for genres
vectorizer = TfidfVectorizer(stop_words='english')
tfidf_matrix = vectorizer.fit_transform(movies['genres'].fillna(''))

# Compute similarity
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

Explanation:

  • TF-IDF Vectorizer converts movie genres into numerical features.
  • Cosine similarity finds similar movies.

Conclusion

  • We implemented a movie recommendation system project with source code.
  • Used collaborative filtering (SVD) and content-based filtering.
  • The system recommends personalized movies to users based on their past ratings.

For further improvements, you can try hybrid models combining both approaches.


Do you want more advanced recommendation techniques? Let me know in the comments!

Explore more from us: Click Here

Latest Posts:

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *