Election Data Cleaning Project

Built a robust RStudio pipeline to clean, standardize, and merge granular PKW election records into a consistent, county-level analytics dataset across both rounds of Poland's 2020 presidential election.

Overview

This project consolidated raw precinct/commission CSV exports from PKW (Polish National Electoral Commission) into a harmonized county-level dataset, ensuring consistent schemas, clean geographic keys, and reproducible transformations in RStudio for downstream analysis and mapping.

The workflow covered schema alignment, text normalization, deduplication, outlier handling, and multi-table joins, followed by turnout and candidate-share calculations for round one and round two, statistical testing, and geospatial visualization using sf and ggplot2.

Technology Stack

  • R & RStudio: Primary development environment
  • dplyr, tidyr, readr: Data manipulation and import
  • ggplot2, sf, scales: Visualization and geospatial data
  • knitr: Reproducible research and documentation

Key Features

  • Robust data pipeline: end-to-end from raw CSV to analysis-ready datasets
  • Schema harmonization and geographic key standardization
  • Multi-round analysis, statistical modeling, and geospatial visualization
  • Reproducible workflow with version-controlled scripts

Challenges & Solutions

  • Inconsistent Schemas: Mapping functions to align column structures across rounds
  • Geographic Key Variations: Normalization pipeline with lowercasing and canonical name resolution
  • Non-territorial Records: Filtered out "Zagranica" and "Statki"

Results & Outcomes

  • Harmonized county-level dataset ready for analysis across both rounds
  • Reproducible pipeline with clear documentation
  • Statistical insights and geospatial visualizations revealing regional patterns

Visualizations

Election Data Visualization - Poland Election Results Map

Download R Script

← Back to Projects


Related Projects

  • Bachelor Thesis — the research that used this cleaned election data to analyze third-force candidates' impact on voter turnout