"Some see chaos. I see patterns waiting to be found."

Reddit Data Pipeline

End-to-end data pipeline that extracts posts from the Reddit API, orchestrates processing through Apache Airflow with Celery workers, stages data in S3 with date partitioning, catalogues schemas via AWS Glue, transforms to Parquet with PySpark, and loads into Redshift for analytics.

Python Airflow AWS S3 Redshift Glue Athena Terraform Docker
+

eBay Deal Finder

Production-grade pipeline that ingests eBay listings via the Browse API, stores raw JSON in PostgreSQL, transforms through dbt with staging, intermediate and mart layers, calculates fair market value using IQR-based outlier detection, and surfaces undervalued items through a FastAPI-powered Streamlit dashboard and email alerts.

Python Airflow PostgreSQL dbt FastAPI Streamlit Docker GitHub Actions
+