My Data Science Projects

50M+ Data Points | 510+ Securities | 95% Model Accuracy

Projects I've worked on - from loan default prediction to algorithmic trading systems. Each one solves real problems with real data.

By the Numbers

50M+ Data Points Processed
510+ Securities Tracked
95% Peak Model Accuracy
5+ Database Platforms
24+ Years Historical Data
6M+ Monthly Throughput

Featured Projects

文学
Active

Bungaku

Billboard Music Analysis Platform

Statistical analysis of 130K+ Billboard entries using ANOVA, Tukey HSD, and t-tests to discover seasonal patterns and decade-based shifts in music trends.

Python SciPy Seaborn Statistical Testing
130K+ entries 4 statistical tests 95% accuracy
CT
Active

CandleThrob

Algorithmic Trading System

Advanced trading system tracking 510+ securities with 24+ years of historical data. Calculates 113+ technical indicators and stores everything in Oracle.

Python Oracle TA-Lib Polygon.io
510+ securities 113+ indicators 24+ years data
CPP
Completed

Customer Purchase Predictor

ML-Powered Customer Analytics

Logistic classifier analyzing 3.9M+ behavioral events to predict purchase triggers with optimized business decision thresholds.

Python SHAP Streamlit Feature Engineering
3.9M+ events 95% accuracy 0.67 threshold
LC
Completed

Lending Club Risk Model

Credit Risk Assessment System

XGBoost model processing 2.2M+ loan records with SHAP explainability for transparent risk assessment decisions.

XGBoost SHAP Risk Modeling Python
2.2M+ loans 77% accuracy 0.76 F1-score
BCC
Completed

Breast Cancer Classifier

From-Scratch ML Implementation

Coded logistic regression from scratch achieving 95% test accuracy. Implemented without ML libraries to demonstrate deep understanding of mathematical foundations and algorithm mechanics.

Python NumPy From Scratch Mathematical Foundations
95% accuracy No ML libraries Pure math