Electric Vehicle Popularity Analysis Using Machine Learning
- Predicting House Prices using Machine Learning - April 10, 2025
- 10 Data Visualization Project Ideas with Source Code - April 9, 2025
- Music Recommendation System using Python – Full Project - April 7, 2025

Electric Vehicle Popularity Analysis Using Machine Learning:
The adoption of electric vehicles (EVs) is growing rapidly worldwide, driven by technological advancements and environmental concerns. Understanding the factors influencing EV popularity can help manufacturers, policymakers, and researchers strategize better. This project focuses on Electric Vehicle Popularity Analysis Using Machine Learning, where we analyze EV-related data to uncover trends and build predictive models to understand the factors driving EV adoption.
Objective:
- Perform an analysis of electric vehicle adoption using machine learning models.
- Explore the dataset to identify key factors influencing EV popularity.
- Build predictive models to analyze EV adoption trends and evaluate their accuracy.
Dataset:
The dataset contains columns like VIN (1-10)
, County
, City
, State
, Postal Code
, Model Year
, Make
, Model
, Electric Vehicle Type
, Clean Alternative Fuel Vehicle (CAFV) Eligibility
, Electric Range
, Base MSRP
, Legislative District
, Vehicle Location
, and more. These features will be used for analysis. Click here to Download the Dataset.
Tools & Libraries:
- Python
- Pandas
- NumPy
- Matplotlib
- Seaborn
- Scikit-learn
- XGBoost
- Jupyter Notebook or Google Colab (Recommended)
1. Data Collection & Setup
Import Libraries and Load Dataset
# Import necessary libraries import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.model_selection import train_test_split from sklearn.preprocessing import LabelEncoder, StandardScaler from sklearn.metrics import classification_report, accuracy_score, confusion_matrix # Load the dataset data = pd.read_csv('/path/to/ev_data.csv') # Replace with the actual file path # Preview the dataset data.head()

Explanation:
- The dataset is loaded using pandas, and the first few rows are displayed to understand its structure. Common columns include
Electric Range
,Base MSRP
, and other EV-related features.
2. Exploratory Data Analysis (EDA)
Dataset Overview
# Display dataset information data.info() data.describe() data.isnull().sum()



Electric Range Distribution
# Visualize the distribution of Electric Range sns.histplot(data['Electric Range'], kde=True, color='green') plt.title('Distribution of Electric Range') plt.xlabel('Electric Range (in miles)') plt.ylabel('Frequency') plt.show()

Correlation Analysis
# Correlation heatmap plt.figure(figsize=(12, 8)) sns.heatmap(data.corr(), annot=True, cmap='coolwarm') plt.title('Correlation Heatmap') plt.show()

Base MSRP vs. Electric Range
# Scatter plot for Base MSRP vs. Electric Range plt.figure(figsize=(10, 6)) sns.scatterplot(x='Base MSRP', y='Electric Range', hue='Electric Vehicle Type', data=data, palette='viridis') plt.title('Base MSRP vs. Electric Range by Vehicle Type') plt.xlabel('Base MSRP (in USD)') plt.ylabel('Electric Range (in miles)') plt.show()

Explanation:
- EDA helps identify missing values, understand the distribution of key features, and uncover relationships between variables such as
Base MSRP
andElectric Range
.
3. Data Cleaning
Handle Missing Values
# Fill missing values in critical columns data['Electric Range'] = data['Electric Range'].fillna(data['Electric Range'].median()) data['Base MSRP'] = data['Base MSRP'].fillna(data['Base MSRP'].mean()) # Drop rows with missing `Model` or `Make` data.dropna(subset=['Model', 'Make'], inplace=True)
Remove Outliers
# Remove outliers in `Electric Range` and `Base MSRP` numerical_cols = ['Electric Range', 'Base MSRP'] for col in numerical_cols: upper_limit = data[col].mean() + 3 * data[col].std() lower_limit = data[col].mean() - 3 * data[col].std() data = data[(data[col] >= lower_limit) & (data[col] <= upper_limit)]
Explanation:
- Missing values are handled appropriately for numerical columns using the median or mean. Rows with critical missing data, such as
Model
orMake
, are dropped. Outliers are removed to enhance model performance.
4. Data Preprocessing & Feature Engineering
Encode Categorical Features
# Encode categorical variables encoder = LabelEncoder() categorical_cols = ['Make', 'Model', 'Electric Vehicle Type', 'County', 'State'] for col in categorical_cols: data[col] = encoder.fit_transform(data[col])
Feature Scaling
# Scale numerical features scaler = StandardScaler() numerical_cols = ['Electric Range', 'Base MSRP'] data[numerical_cols] = scaler.fit_transform(data[numerical_cols])
Train-Test Split
# Split data into features and target X = data.drop(columns=['VIN (1-10)', 'Sales']) # Replace 'Sales' with the target column y = data['Sales'] # Assume `Sales` column exists for target # Train-test split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
Explanation:
- Categorical features like
Make
,Model
, andElectric Vehicle Type
are encoded usingLabelEncoder
. Numerical features are scaled to normalize the data. The dataset is split into training and testing sets for evaluation.
5. Data Visualization
Electric Range by Vehicle Type
# Bar plot for Electric Range by Vehicle Type sns.barplot(x='Electric Vehicle Type', y='Electric Range', data=data, palette='coolwarm') plt.title('Electric Range by Vehicle Type') plt.xlabel('Electric Vehicle Type') plt.ylabel('Electric Range (in miles)') plt.show()
Popular EV Makes
# Top 10 EV makes by frequency top_makes = data['Make'].value_counts().head(10) plt.figure(figsize=(10, 6)) top_makes.plot(kind='bar', color='blue') plt.title('Top 10 EV Makes by Frequency') plt.xlabel('Make') plt.ylabel('Count') plt.xticks(rotation=45) plt.show()
Explanation:
- Visualization provides insights into the most common EV makes and how electric range varies across different vehicle types.
6. Model Building & Evaluation
Linear Regression
from sklearn.linear_model import LinearRegression # Train linear regression model linreg = LinearRegression() linreg.fit(X_train, y_train) # Make predictions y_pred_linreg = linreg.predict(X_test) # Evaluate the model print("Linear Regression R^2 Score:", linreg.score(X_test, y_test))
Random Forest Regressor
from sklearn.ensemble import RandomForestRegressor # Train Random Forest Regressor model rf_model = RandomForestRegressor() rf_model.fit(X_train, y_train) # Make predictions y_pred_rf = rf_model.predict(X_test) # Evaluate the model print("Random Forest R^2 Score:", rf_model.score(X_test, y_test))
XGBoost Regressor
from xgboost import XGBRegressor # Train XGBoost Regressor model xgb_model = XGBRegressor() xgb_model.fit(X_train, y_train) # Make predictions y_pred_xgb = xgb_model.predict(X_test) # Evaluate the model print("XGBoost R^2 Score:", xgb_model.score(X_test, y_test))
Explanation:
- Multiple regression models are trained and evaluated to predict EV sales based on input features. R² scores are used to assess model performance.
7. Conclusion
This Electric Vehicle Popularity Analysis Using Machine Learning project demonstrates how to preprocess data, analyze trends, and build effective regression models to predict EV adoption trends. Among the models tested, [insert best-performing model here] achieved the highest R² score. These insights can guide EV manufacturers and policymakers in making data-driven decisions.
Download the dataset and try experimenting with additional models like neural networks to further improve predictions. Share your findings with the community!
Explore more such Projects and become an expert in Data analysis field.
Electric vehicle popularity analysis using machine learning, EV adoption trends, Machine learning EV analysis, Python project for EV sales.