J. Wilson Peoples

Machine Learning Researcher | Aspriring Data Engineer

Learn More


About Me

I am currently a Research Assistant and PhD Candidate in the Department of Mathematics at Penn State University. My research takes place within the vast fields of Machine Learning and Data Science. Some recent projects I've worked on involve:

  • Manifold Learning
  • Transfer Learning
  • Operator Estimation

When I'm not actively researching, I enjoy pursuing my passion for the practical use of data in the real world through the completion of Data Engineering and Data Science projects.

Below are some highlighted projects that I've completed on my journey from Theorist to Practitioner. For a complete list of projects and notebooks, please explore this website using the Portfolio dropdown menu at the top left of each page. Alternatively, checkout my github.


Highlighted Projects

Serverless Podcast Transcription Pipeline

In this project, we leverage AWS Lambda, along with asynchronous AWS Transcribe & Comprehend jobs, to create an event based, fast podcast transcription pipeline.

View Project
Streaming with Apache Spark (Scala)

In this project, we build a cloud native, fully dockerized real time data pipeline: orchestrated with Kubernetes, powered by Spark.

View Project
BoundaryDM Manifold Learning Library

This project consists of a "skikit-learn" style deployment of a novel technique for manifold learning / nonlinear dimensionality reduction, developed by me and my advisor.

View Project

Highlighted Notebooks

Big Data Workflow with Pandas and SQLite3

In this notebook, demonstrate a workflow using pandas and SQLite3 which scales well with large data.

View Notebook
Implementing Efficient Joins on Mobile App Data

In this notebook, we implement a few different join algorithms, and study their time and space complexity on real world datasets.

View Notebook
Document Search Using Map Reduce

In this notebook, we implement a custom MapReduce framework in python, and use it to create a document search function.

View Notebook

Highlighted Theoretical Projects

Talk Slides from SIAM Conference on Mathematics of Data Science

In the Fall of 2022, I gave a talk on my research at the SIAM Conference on Mathematics of Data Science (MDS22). Attached are the slides I prepared for my talk.

View Project
Poster from FoCM 2023

I was an invited poster presenter at the ninth conference of the Foundations of Computational Mathematics (FoCM) Society in Paris during the Summer of 2023. Attached is a .pdf of my poster.

View Project
Solutions to ESL

A key part of my Data Science journey has been solving exercies from the seminal textbook Elements of Statistical Learning. Here are my solutions to the exercises from this excellent book.

View Project