Electric Vehicle Popularity Analysis Using Machine Learning

KANGKAN KALITA
Electric Vehicle Popularity Analysis Using Machine Learning

Electric Vehicle Popularity Analysis Using Machine Learning:


The adoption of electric vehicles (EVs) is growing rapidly worldwide, driven by technological advancements and environmental concerns. Understanding the factors influencing EV popularity can help manufacturers, policymakers, and researchers strategize better. This project focuses on Electric Vehicle Popularity Analysis Using Machine Learning, where we analyze EV-related data to uncover trends and build predictive models to understand the factors driving EV adoption.

Objective:

  • Perform an analysis of electric vehicle adoption using machine learning models.
  • Explore the dataset to identify key factors influencing EV popularity.
  • Build predictive models to analyze EV adoption trends and evaluate their accuracy.

Dataset:
The dataset contains columns like VIN (1-10), County, City, State, Postal Code, Model Year, Make, Model, Electric Vehicle Type, Clean Alternative Fuel Vehicle (CAFV) Eligibility, Electric Range, Base MSRP, Legislative District, Vehicle Location, and more. These features will be used for analysis. Click here to Download the Dataset.

Tools & Libraries:

  • Python
  • Pandas
  • NumPy
  • Matplotlib
  • Seaborn
  • Scikit-learn
  • XGBoost
  • Jupyter Notebook or Google Colab (Recommended)

1. Data Collection & Setup

Import Libraries and Load Dataset

# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix

# Load the dataset
data = pd.read_csv('/path/to/ev_data.csv')  # Replace with the actual file path

# Preview the dataset
data.head()

Explanation:

  • The dataset is loaded using pandas, and the first few rows are displayed to understand its structure. Common columns include Electric Range, Base MSRP, and other EV-related features.

2. Exploratory Data Analysis (EDA)

Dataset Overview

# Display dataset information
data.info()
data.describe()
data.isnull().sum()

Electric Range Distribution

# Visualize the distribution of Electric Range
sns.histplot(data['Electric Range'], kde=True, color='green')
plt.title('Distribution of Electric Range')
plt.xlabel('Electric Range (in miles)')
plt.ylabel('Frequency')
plt.show()

Correlation Analysis

# Correlation heatmap
plt.figure(figsize=(12, 8))
sns.heatmap(data.corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

Base MSRP vs. Electric Range

# Scatter plot for Base MSRP vs. Electric Range
plt.figure(figsize=(10, 6))
sns.scatterplot(x='Base MSRP', y='Electric Range', hue='Electric Vehicle Type', data=data, palette='viridis')
plt.title('Base MSRP vs. Electric Range by Vehicle Type')
plt.xlabel('Base MSRP (in USD)')
plt.ylabel('Electric Range (in miles)')
plt.show()

Explanation:

  • EDA helps identify missing values, understand the distribution of key features, and uncover relationships between variables such as Base MSRP and Electric Range.

3. Data Cleaning

Handle Missing Values

# Fill missing values in critical columns
data['Electric Range'] = data['Electric Range'].fillna(data['Electric Range'].median())
data['Base MSRP'] = data['Base MSRP'].fillna(data['Base MSRP'].mean())

# Drop rows with missing `Model` or `Make`
data.dropna(subset=['Model', 'Make'], inplace=True)

Remove Outliers

# Remove outliers in `Electric Range` and `Base MSRP`
numerical_cols = ['Electric Range', 'Base MSRP']
for col in numerical_cols:
    upper_limit = data[col].mean() + 3 * data[col].std()
    lower_limit = data[col].mean() - 3 * data[col].std()
    data = data[(data[col] >= lower_limit) & (data[col] <= upper_limit)]

Explanation:

  • Missing values are handled appropriately for numerical columns using the median or mean. Rows with critical missing data, such as Model or Make, are dropped. Outliers are removed to enhance model performance.

4. Data Preprocessing & Feature Engineering

Encode Categorical Features

# Encode categorical variables
encoder = LabelEncoder()
categorical_cols = ['Make', 'Model', 'Electric Vehicle Type', 'County', 'State']
for col in categorical_cols:
    data[col] = encoder.fit_transform(data[col])

Feature Scaling

# Scale numerical features
scaler = StandardScaler()
numerical_cols = ['Electric Range', 'Base MSRP']
data[numerical_cols] = scaler.fit_transform(data[numerical_cols])

Train-Test Split

# Split data into features and target
X = data.drop(columns=['VIN (1-10)', 'Sales'])  # Replace 'Sales' with the target column
y = data['Sales']  # Assume `Sales` column exists for target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

Explanation:

  • Categorical features like Make, Model, and Electric Vehicle Type are encoded using LabelEncoder. Numerical features are scaled to normalize the data. The dataset is split into training and testing sets for evaluation.

5. Data Visualization

Electric Range by Vehicle Type

# Bar plot for Electric Range by Vehicle Type
sns.barplot(x='Electric Vehicle Type', y='Electric Range', data=data, palette='coolwarm')
plt.title('Electric Range by Vehicle Type')
plt.xlabel('Electric Vehicle Type')
plt.ylabel('Electric Range (in miles)')
plt.show()

Popular EV Makes

# Top 10 EV makes by frequency
top_makes = data['Make'].value_counts().head(10)
plt.figure(figsize=(10, 6))
top_makes.plot(kind='bar', color='blue')
plt.title('Top 10 EV Makes by Frequency')
plt.xlabel('Make')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.show()

Explanation:

  • Visualization provides insights into the most common EV makes and how electric range varies across different vehicle types.

6. Model Building & Evaluation

Linear Regression

from sklearn.linear_model import LinearRegression

# Train linear regression model
linreg = LinearRegression()
linreg.fit(X_train, y_train)

# Make predictions
y_pred_linreg = linreg.predict(X_test)

# Evaluate the model
print("Linear Regression R^2 Score:", linreg.score(X_test, y_test))

Random Forest Regressor

from sklearn.ensemble import RandomForestRegressor

# Train Random Forest Regressor model
rf_model = RandomForestRegressor()
rf_model.fit(X_train, y_train)

# Make predictions
y_pred_rf = rf_model.predict(X_test)

# Evaluate the model
print("Random Forest R^2 Score:", rf_model.score(X_test, y_test))

XGBoost Regressor

from xgboost import XGBRegressor

# Train XGBoost Regressor model
xgb_model = XGBRegressor()
xgb_model.fit(X_train, y_train)

# Make predictions
y_pred_xgb = xgb_model.predict(X_test)

# Evaluate the model
print("XGBoost R^2 Score:", xgb_model.score(X_test, y_test))

Explanation:

  • Multiple regression models are trained and evaluated to predict EV sales based on input features. R² scores are used to assess model performance.

7. Conclusion

This Electric Vehicle Popularity Analysis Using Machine Learning project demonstrates how to preprocess data, analyze trends, and build effective regression models to predict EV adoption trends. Among the models tested, [insert best-performing model here] achieved the highest R² score. These insights can guide EV manufacturers and policymakers in making data-driven decisions.


Download the dataset and try experimenting with additional models like neural networks to further improve predictions. Share your findings with the community!

Explore more such Projects and become an expert in Data analysis field.

Electric vehicle popularity analysis using machine learning, EV adoption trends, Machine learning EV analysis, Python project for EV sales.


Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *