- RoleData Engineering & Analytics
- StackPython, Postgres, Debezium, Kafka, DuckDB, Streamlit
- Year2026
Incremental Data Processing for NYC Taxi Ride Analytics
Research question: Can Incremental Batch Processing provide meaningful advantages compared to Full Recomputation for large scale datasets?
- Capture row-level changes from Postgres using Debezium, stream to Kafka, and apply incremental loads into DuckDB.
- Compare Incremental Batch pipelines against Full Recomputation using an experiment harness measuring Latency, Delta Scalability, and Resource Utilization.
- Build a Streamlit dashboard to visualise ride metrics and run A/B evaluation scenarios.
- Report trade-offs and thresholds where incremental methods outperform full recompute on NYC Taxi workloads.