"Some see chaos. I see patterns waiting to be found."
projects
Reddit Data Pipeline
End-to-end data pipeline that extracts posts from the Reddit API, orchestrates processing through Apache Airflow with Celery workers, stages data in S3 with date partitioning, catalogues schemas via AWS Glue, transforms to Parquet with PySpark, and loads into Redshift for analytics.
Python
Airflow
AWS S3
Redshift
Glue
Athena
Terraform
Docker
eBay Deal Finder
Production-grade pipeline that ingests eBay listings via the Browse API, stores raw JSON in PostgreSQL, transforms through dbt with staging, intermediate and mart layers, calculates fair market value using IQR-based outlier detection, and surfaces undervalued items through a FastAPI-powered Streamlit dashboard and email alerts.
Python
Airflow
PostgreSQL
dbt
FastAPI
Streamlit
Docker
GitHub Actions