Machine Learning
&
Neural Networks Blog

Road Repair Cost Prediction

This project aims to predict the total cost of road repairs using a Linear Regression model. The process involves data preprocessing, training a machine learning model, evaluating its performance, and visualizing the results.

This structured approach demonstrates how data preprocessing, machine learning, and visualization techniques can be integrated to develop a predictive model for road repair costs. The results offer valuable insights for city planners and engineers in budget forecasting and resource allocation for road maintenance projects.

Import Necessary Libraries
These libraries are 'pandas' for data manipulation and analysis; 'numpy' for numerical operations; 'matplotlib' for plotting graphs; 'seaborn' for statistical data visualization; 'sklearn.model_selection.train_test_split' to split data into training and testing sets; 'sklearn.preprocessing.StandardScaler' to standardize features; 'sklearn.preprocessing.LabelEncoder' to encode categorical variables; 'sklearn.linear_model.LinearRegression' to perform linear regression; 'sklearn.metrics' for evaluating the model's performance.


 import pandas as pd
 import numpy as np
 import matplotlib.pyplot as plt
 import seaborn as sns
 from sklearn.model_selection import train_test_split
 from sklearn.preprocessing import StandardScaler, LabelEncoder
 from sklearn.linear_model import LinearRegression
 from sklearn.metrics import mean_squared_error, r2_score
                            

Load the Data
Load the dataset containing road repair data from a CSV file into a pandas DataFrame.


 data = pd.read_csv("file_location")
                            

Encode the Categorical Variables
Convert categorical variables (Type and Condition) into numerical format, as machine learning models require numerical inputs.


 le = LabelEncoder()
 data["Type"] = le.fit_transform(data["Type"])
 data["Condition"] = le.fit_transform(data["Condition"])
                            

Separate the Target Variable and Features
Target Variable: 'Total_Cost', which we want to predict. Features: All other columns used as inputs to the model.


 y = data["Total_Cost"]
 X = data.drop("Total_Cost", axis=1)
                            

Scale the Features Scale the features to have zero mean and unit variance, which helps in improving the performance of the machine learning model.


 scaler = StandardScaler()
 X_scaled = scaler.fit_transform(X)
                            

Split the Data into Training and Testing Sets
Training Set: 70% of the data, used to train the model. Testing Set: 30% of the data, used to evaluate the model. Random State: ensures reproducibility of the results.


 X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.3, random_state=42)
                            

Train and evaluate the Model
Model Initialization: Create an instance of the Linear Regression model. Model Training: Fit the model to the training data.
Predictions: Use the trained model to predict the Total_Cost on the testing set.
Mean Squared Error (MSE): Measure of the average squared difference between actual and predicted values. Lower values indicate better performance.
R-squared Score: Indicates the proportion of variance in the dependent variable that is predictable from the independent variables. Values closer to 1 indicate better performance.


 model = LinearRegression()
 model.fit(X_train, y_train)
 y_pred = model.predict(X_test)
 mse = mean_squared_error(y_test, y_pred)
 r2 = r2_score(y_test, y_pred)
                            

Visualize the Model's Predictions
Scatter Plot: Plot actual vs. predicted values to visualize the model's performance. Diagonal Line: The dashed line represents a perfect prediction. The closer the points are to this line, the better the model's predictions.


 plt.scatter(y_test, y_pred, alpha=0.5)
 plt.xlabel("Actual Total Cost")
 plt.ylabel("Predicted Total Cost")
 plt.title("Actual vs. Predicted Total Cost of Road Repairs")
 plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], "k--", linewidth=2)
 plt.show()
                            


road


Below is the full code with additional comments embedded.


 # Import the necessary libraries
 import pandas as pd
 import numpy as np
 import matplotlib.pyplot as plt
 import seaborn as sns
 from sklearn.model_selection import train_test_split
 from sklearn.preprocessing import StandardScaler, LabelEncoder
 from sklearn.linear_model import LinearRegression
 from sklearn.metrics import mean_squared_error, r2_score
        
 # Load the data
 data = pd.read_csv("file_location")
        
 # Encode the categorical variables (Type and Condition)
 le = LabelEncoder()
 data["Type"] = le.fit_transform(data["Type"])
 data["Condition"] = le.fit_transform(data["Condition"])
        
 # Separate the target variable and features
 y = data["Total_Cost"]
 X = data.drop("Total_Cost", axis=1)
        
 # Scale the features
 scaler = StandardScaler()
 X_scaled = scaler.fit_transform(X)
        
 # Split the data into training and testing sets
 # Split the data (70% for training and 30% for testing)
 X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.3, random_state=42)
        
 # Train the model
 # Initialize the model
 model = LinearRegression()
        
 # Train the model
 model.fit(X_train, y_train)
        
 # Evaluate the model
 # Make predictions on the test set
 y_pred = model.predict(X_test)
        
 # Calculate the mean squared error
 mse = mean_squared_error(y_test, y_pred)
 print("Mean Squared Error:", mse)
 
 # Calculate the R-squared score
 r2 = r2_score(y_test, y_pred)
 print("R-squared Score:", r2)
        
 # Visualize the model's predictions
 # Create a plot for actual vs. predicted values
 plt.scatter(y_test, y_pred, alpha=0.5)
 plt.xlabel("Actual Total Cost")
 plt.ylabel("Predicted Total Cost")
 plt.title("Actual vs. Predicted Total Cost of Road Repairs")
 plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], "k--", linewidth=2)
 plt.show()
                            



Get the Jupyter Notebook and the dataset used in this project.

If you found this project interesting, you can share a coffee with me, by accessing the below link.

Boost Your Brand's Visibility

Partner with us to boost your brand's visibility and connect with our community of tech enthusiasts and professionals. Our platform offers great opportunities for engagement and brand recognition.

Interested in advertising on our website? Reach out to us at office@ml-nn.eu.