In this project, I used U.S. election data from 2008 to 2020 to analyze the relationship between demographic and socioeconomic factors—such as income, age, education, and employment—and voting outcomes at the county level. After thorough data cleaning and exploratory analysis, I trained models like Random Forest and XGBoost to predict election results based on these factors. When tested on the 2020 data, the model’s predictions closely aligned with the actual results in my dataset, correctly forecasting a Democratic win. Finally, I applied the model to 2023 census data to project the 2024 election outcome, exploring how demographic shifts could shape future voting patterns.
Comprehensive EDA: Analyzed and visualized demographic, economic, and voting trends across counties to identify influential factors.
Predictive Modeling: Used machine learning models-Random Forest, XGBoost, and Linear Regression to predict election outcomes based on socioeconomic data, illustrating the relationship between these factors and voting patterns.