Crime Prediction and Prevention
This project aims to analyze and visualize crime data in Bucharest using geospatial mapping and
machine learning techniques. The primary focus is on classifying crime severity and making
predictions about crime severity using the K-Nearest Neighbors (KNN) algorithm. This comprehensive
approach combines data processing,
machine learning, and interactive visualization to provide valuable insights into crime patterns.
The map provides a powerful tool for law enforcement agencies, city planners, and researchers. It
enables them to visualize crime data, understand spatial distributions, and anticipate future crime
hotspots, thereby facilitating informed decision-making and proactive measures for crime prevention.
Importing Libraries
Essential libraries for data manipulation, visualization, and machine learning are imported. These
include: 'pandas' for handling data frames; 'folium' for creating interactive maps; 'os' for
directory operations; 'numpy' for numerical operations; 'sklearn' for machine learning
functionalities.
import pandas as pd import folium import os import numpy as np from sklearn.neighbors import KNeighborsClassifier from sklearn.model_selection import train_test_split
Loading Data
Crime data is read from a CSV file into a pandas DataFrame. This dataset includes various attributes
such as crime type, location coordinates (latitude and longitude), and descriptions.
data = pd.read_csv("file_location")
Classifying Crimes
Crimes are classified into two categories based on severity: 'Most Dangerous' includes crimes like
Assault, Robbery, and Burglary and 'Less Dangerous' for all other crimes.
most_dangerous_crimes = ['Assault', 'Robbery', 'Burglary'] data['Severity'] = data['CrimeType'].apply(lambda x: 'Most Dangerous' if x in most_dangerous_crimes else 'Less Dangerous')
Feature Extraction and Splitting
The features for the machine learning model are the latitude and longitude coordinates, while the
target variable is the crime severity. The data is split into training and testing sets using an
80-20 split to ensure the model is trained on a substantial portion of the data while leaving enough
data for validation.
X = data[['Latitude', 'Longitude']] y = data['Severity'] X_train, _, y_train, _ = train_test_split(X, y, test_size=0.2, random_state=42)
Training the KNN Model
A K-Nearest Neighbors (KNN) classifier is initialized with 5 neighbors. The model is trained on the
training data (80% of the dataset).
knn = KNeighborsClassifier(n_neighbors=5) knn.fit(X_train, y_train)
Creating the Base Map
A base map centered on Bucharest is created using Folium. The coordinates for Bucharest are
[44.4268, 26.1025].
bucharest_coordinates = [44.4268, 26.1025] base_map = folium.Map(location=bucharest_coordinates, zoom_start=12)
Adding Crime Points
Each crime's location is plotted on the map. 'Red Markers' indicate 'Most Dangerous' crimes and
'Orange Markers' indicate 'Less Dangerous' crimes. Markers are added as CircleMarkers with popups
containing crime descriptions, allowing for interactive exploration of the data.
for idx, row in data.iterrows(): color = 'red' if row['Severity'] == 'Most Dangerous' else 'orange' folium.CircleMarker( location=(row['Latitude'], row['Longitude']), radius=5, color=color, fill=True, fill_color=color, fill_opacity=0.6, popup=row['Description'] ).add_to(base_map)
Making Predictions and Adding to the Map
A sample of the dataset is used to make predictions using the trained KNN model. Predicted crime
severities are added to the map. 'Blue Markers' indicate predicted 'Most Dangerous' crimes and 'Dark
Blue Markers' indicate predicted 'Less Dangerous' crimes.
prediction_data = data.sample(n=10) prediction_data['PredictedSeverity'] = knn.predict(prediction_data[['Latitude', 'Longitude']]) for idx, row in prediction_data.iterrows(): color = 'blue' if row['PredictedSeverity'] == 'Most Dangerous' else 'darkblue' popup_message = 'Predicted Most Dangerous Crime' if color == 'blue' else 'Predicted Less Dangerous Crime' folium.CircleMarker( location=(row['Latitude'], row['Longitude']), radius=5, color=color, fill=True, fill_color=color, fill_opacity=0.6, popup=popup_message ).add_to(base_map)
Saving the Map
The output directory is specified, and if it does not exist, it is created using 'os.makedirs'. The
final interactive map, which now includes both actual and predicted crime locations, is saved as an
HTML file.
output_dir = 'folder_location' os.makedirs(output_dir, exist_ok=True) output_file = os.path.join(output_dir, 'crime_hotspots_with_predictions.html') base_map.save(output_file) print(f"Map saved to {output_file}")
Below is the full code with additional comments embedded.
# Import necessary libraries import pandas as pd import folium import os import numpy as np from sklearn.neighbors import KNeighborsClassifier from sklearn.model_selection import train_test_split # Step 1: Read the data from CSV # Load the dataset containing crime data. Make sure to specify the correct file path. data = pd.read_csv("file_location") # Step 2: Classify the crimes # Define the criteria for classifying crimes as 'Most Dangerous' or 'Less Dangerous'. # Here, we assume that 'Assault', 'Robbery', and 'Burglary' are considered the most dangerous crimes. most_dangerous_crimes = ['Assault', 'Robbery', 'Burglary'] # Define your criteria data['Severity'] = data['CrimeType'].apply(lambda x: 'Most Dangerous' if x in most_dangerous_crimes else 'Less Dangerous') # Step 3: Train KNN model # Extract features (latitude and longitude) and target variable (severity). X = data[['Latitude', 'Longitude']] y = data['Severity'] # Split the data into training and testing sets. We use 80% of the data for training. X_train, _, y_train, _ = train_test_split(X, y, test_size=0.2, random_state=42) # Splitting data, you can adjust test_size # Initialize and train the K-Nearest Neighbors (KNN) model with 5 neighbors. knn = KNeighborsClassifier(n_neighbors=5) knn.fit(X_train, y_train) # Step 4: Create a base map centered on Bucharest # Define the coordinates for Bucharest and create a base map with a zoom level of 12. bucharest_coordinates = [44.4268, 26.1025] base_map = folium.Map(location=bucharest_coordinates, zoom_start=12) # Step 5: Add crime points to the map # Iterate through the dataset and add each crime's location to the map. # Use red color for 'Most Dangerous' crimes and orange for 'Less Dangerous' crimes. for idx, row in data.iterrows(): color = 'red' if row['Severity'] == 'Most Dangerous' else 'orange' folium.CircleMarker( location=(row['Latitude'], row['Longitude']), radius=5, color=color, fill=True, fill_color=color, fill_opacity=0.6, popup=row['Description'] ).add_to(base_map) # Step 6: Make predictions using KNN algorithm and add blue dots for predictions to the map # Generate a sample of the dataset for making predictions. prediction_data = data.sample(n=10) # Predict the severity of crimes using the trained KNN model. prediction_data['PredictedSeverity'] = knn.predict(prediction_data[['Latitude', 'Longitude']]) # Add the predicted crime locations to the map. Use blue for 'Most Dangerous' and dark blue for 'Less Dangerous' predictions. for idx, row in prediction_data.iterrows(): color = 'blue' if row['PredictedSeverity'] == 'Most Dangerous' else 'darkblue' popup_message = 'Predicted Most Dangerous Crime' if color == 'blue' else 'Predicted Less Dangerous Crime' folium.CircleMarker( location=(row['Latitude'], row['Longitude']), radius=5, color=color, fill=True, fill_color=color, fill_opacity=0.6, popup=popup_message ).add_to(base_map) # Step 7: Save the map to a different directory # Define the output directory and ensure it exists. output_dir = 'folder_location' os.makedirs(output_dir, exist_ok=True) # Define the full path to the output file output_file = os.path.join(output_dir, 'crime_hotspots_with_predictions.html') # Save the map to the specified location base_map.save(output_file) print(f"Map saved to {output_file}")