Traffic Management
This project demonstrates a complete workflow for building and evaluating a machine learning model
(Random Forest Classifier) to predict traffic congestion in Bucharest based on various features such
as weather conditions, road type, and traffic volume. It also includes visualization using folium to
map traffic observations.
The combination of predictive modeling and interactive mapping provides a powerful toolset for urban
planners and transportation authorities striving to improve traffic efficiency and reduce congestion
in metropolitan areas.
Loading and Preparing Data
'pd.read_csv()': Reads the CSV file into a pandas DataFrame named 'data'.
Categorical Encoding: converts categorical variables (Weather Conditions, Road Type, Day of Week)
into numerical representations using .astype('category').cat.codes. This transformation is necessary
because machine learning algorithms typically work with numerical data.
data = pd.read_csv('test_data.csv') data['Weather Conditions'] = data['Weather Conditions'].astype('category').cat.codes data['Road Type'] = data['Road Type'].astype('category').cat.codes data['Day of Week'] = data['Day of Week'].astype('category').cat.codes
Splitting Data for Training and Testing
Feature (X) and Target (y) Split: Separates the DataFrame into features (X) and the target variable
(y) which is 'Congestion Level'.
Train-Test Split: Splits the data into training (X_train, y_train) and testing (X_test, y_test) sets
using train_test_split() from sklearn.model_selection. This allows evaluation of model performance
on unseen data.
X = data.drop(['Congestion Level', 'Date', 'Time'], axis=1) # Exclude Date and Time for now y = data['Congestion Level'] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Training a Random Forest Classifier
Random Forest Classifier: Initializes a Random Forest classifier with 100 trees (n_estimators=100)
and sets a random seed (random_state=42) for reproducibility.
Model Training: Fits (model.fit()) the classifier on the training data (X_train, y_train) to learn
patterns and relationships between features and the target variable.
model = RandomForestClassifier(n_estimators=100, random_state=42) model.fit(X_train, y_train)
Evaluating Model Performance
Model Prediction and Evaluation: Uses the trained model to predict congestion levels on the test set
(X_test). Calculates accuracy (accuracy_score) and generates a classification report
(classification_report) to assess model performance.
y_pred = model.predict(X_test) print("Accuracy:", accuracy_score(y_test, y_pred)) print("Classification Report:") print(classification_report(y_test, y_pred))
Predicting Congestion for a New Location
Prediction for New Data: Creates a DataFrame (new_data) representing a new location with specific
features (latitude, longitude, weather conditions, etc.). Encodes categorical variables (Day of
Week, Weather Conditions, Road Type) using the same encoding as the training data.
Making Prediction: Uses the trained model to predict whether the new location will be congested or
not based on its features.
new_data = pd.DataFrame({ 'Latitude': [44.4268], 'Longitude': [26.1025], 'Day of Week': pd.Categorical(['Thursday'], categories=['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']).codes, 'Weather Conditions': pd.Categorical(['clear'], categories=['clear', 'sunny', 'partly cloudy', 'rainy', 'cloudy']).codes, 'Temperature': [27], 'Humidity': [62], 'Wind Speed': [10], 'Road Type': pd.Categorical(['city street'], categories=['city street', 'highway']).codes, 'Traffic Volume': [1100], 'Traffic Speed': [45] }) prediction = model.predict(new_data) if prediction[0] == 1: print("This location is predicted to be congested.") else: print("This location is predicted to be not congested.")
Visualizing Traffic Observations on a Map
Creating and Saving a Map: Initializes a folium.Map centered on Bucharest (map_bucharest). Adds markers for each traffic observation in data, indicating congestion levels. Saves the map as an HTML file ('bucharest_traffic_map.html') using map_bucharest.save().
map_bucharest = folium.Map(location=[44.4268, 26.1025], zoom_start=12) marker_cluster = MarkerCluster().add_to(map_bucharest) for index, row in data.iterrows(): congestion = "Congested" if row['Congestion Level'] == 1 else "Not Congested" popup_text = f"Date: {row['Date']}<br>Time: {row['Time']}<br>Road Type: {row['Road Type']}<br>Congestion: {congestion}" folium.Marker([row['Latitude'], row['Longitude']], popup=popup_text).add_to(marker_cluster) map_bucharest.save('bucharest_traffic_map.html')
Below is the full code with additional comments embedded.
import pandas as pd from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import accuracy_score, classification_report import folium from folium.plugins import MarkerCluster # Load data from CSV file data = pd.read_csv('test_data.csv') # Convert categorical variables into numerical representations if needed (e.g., using label encoding) data['Weather Conditions'] = data['Weather Conditions'].astype('category').cat.codes data['Road Type'] = data['Road Type'].astype('category').cat.codes data['Day of Week'] = data['Day of Week'].astype('category').cat.codes # Split data into features (X) and target variable (y) X = data.drop(['Congestion Level', 'Date', 'Time'], axis=1) # Exclude Date and Time for now y = data['Congestion Level'] # Split data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Initialize a Random Forest classifier model = RandomForestClassifier(n_estimators=100, random_state=42) # Train the model model.fit(X_train, y_train) # Predict on the test set y_pred = model.predict(X_test) # Evaluate model performance print("Accuracy:", accuracy_score(y_test, y_pred)) print("Classification Report:") print(classification_report(y_test, y_pred)) Predict congestion for a new location new_data = pd.DataFrame({ 'Latitude': [44.4268], 'Longitude': [26.1025], 'Day of Week': pd.Categorical(['Thursday'], categories=['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']).codes, 'Weather Conditions': pd.Categorical(['clear'], categories=['clear', 'sunny', 'partly cloudy', 'rainy', 'cloudy']).codes, 'Temperature': [27], 'Humidity': [62], 'Wind Speed': [10], 'Road Type': pd.Categorical(['city street'], categories=['city street', 'highway']).codes, 'Traffic Volume': [1100], 'Traffic Speed': [45] }) # Make prediction prediction = model.predict(new_data) if prediction[0] == 1: print("This location is predicted to be congested.") else: print("This location is predicted to be not congested.") # Initialize map centered on Bucharest map_bucharest = folium.Map(location=[44.4268, 26.1025], zoom_start=12) # Add markers for each traffic observation marker_cluster = MarkerCluster().add_to(map_bucharest) for index, row in data.iterrows(): congestion = "Congested" if row['Congestion Level'] == 1 else "Not Congested" popup_text = f"Date: {row['Date']}<br>Time: {row['Time']}<br>Road Type: {row['Road Type']}<br>Congestion: {congestion}" folium.Marker([row['Latitude'], row['Longitude']], popup=popup_text).add_to(marker_cluster) # Save the map as an HTML file map_bucharest.save('bucharest_traffic_map.html')