Projects

This page contains end-to-end Machine Learning | Data Engineering Projects.

View my Projects Back to Home


k8s streaming Thumbnail

Streaming with Spark (Scala)
In this project, we build a cloud native, fully dockerized real time data pipeline: orchestrated with Kubernetes, powered by Spark.

K8s, Spark, Kafka, Airflow, MongoDB
serverless podcast pipeline Thumbnail

Serverless Podcast Transcription Pipeline
In this project, we leverage AWS Lambda, along with asynchronous AWS Transcribe & Comprehend jobs, to create an event based, fast podcast transcription pipeline.

AWS Lambda, Data Warehousing
BoundaryDM Thumbnail

BoundaryDM Library
This project consists of an sklearn style deployment of a novel technique for nonlinear dimensionality reduction, developed by me and my advisor.

Machine Learning, optimization, scipy
Reddit ETL Thumbnail

Cloud Based Reddit ETL with Airflow
In this project, we build a data pipeline which extracts data using the reddit API, transforms the extracted data into a structured format, and loads the result into an MySQL database.

Airflow, MySQL, ETLs, AWS
stock-streaming Thumbnail

Real Time Streaming Pipeline with Kafka
In this project, we build a real time data pipeline which streams stock market data from the Twelve Data API and uploads the result to S3.

Kafka, ETLs, AWS
NBA Pipeline Thumbnail

NBA Podcast Data Pipeline with Airflow
In this project, I use airflow to create a data pipeline which automatically downloads podcasts and stores podcast metadata in an SQLite database.

Airflow, ETLs