Available · Data Engineer & ML Engineer · College Park, MD

Aayush Verma.

Building production-grade ML pipelines, NLP systems, and GenAI applications. 3.5+ years turning messy enterprise data into stakeholder-ready decisions at scale.

3.5+

Yrs Experience

15M+

Records Classified

30K+

SKUs Optimised

4.0

MS GPA

Email Me GitHub in LinkedIn

aayush_verma.py

# Data Engineer & ML Engineer

profile = {

"name": "Aayush Verma",

"role": "Data Eng / DS / MLE",

"location": "College Park, MD",

"work_auth": "OPT / STEM OPT eligible",

"gpa": 4.0,

"open_to_work": True,

"stack": [

"PySpark", "Databricks",

"Airflow", "dbt", "Snowflake",

"LangChain", "SageMaker"

"impact": "15M+ records · 30K+ SKUs"

}

$ python reach_out.py

Technical Arsenal

Full-Stack
ML Capabilities

Programming & Databases

PythonNumPypandasSQLMySQLPostgreSQLMS SQL ServerNoSQLR

Machine Learning

Predictive ModelingModel ValidationFeature Engineeringscikit-learnSupervised LearningUnsupervised LearningBagging & BoostingClusteringPyTorchTensorFlow

NLP & GenAI

NLPTransformersLangChainLangGraphRAGPydanticAIChromaDBFAISSLLM EvaluationLCEL

Experimentation & Causal Inference

A/B TestingHypothesis TestingDifference-in-DifferencesGuardrail MetricsFixed Effects RegressionTreatment Effect Estimation

Data Engineering & Warehousing

PySparkAirflowDatabricksSnowflakeBigQuerydbtETL/ELTBatch InferenceSQL Validation

MLOps & Deployment

MLflowAWS SageMakerAWS EC2DockerFastAPIJenkinsCI/CDModel MonitoringDrift DetectionGit

Analytics & Visualization

TableauPower BIStreamlitKPI ReportingDashboardingData StorytellingSHAPLDA Topic ModelingTime-Series Forecasting

Career Timeline

Work Experience

CausifyAI - University of Maryland

Jan 2026 – Present

// Academic Research · College Park, MD

Developed notebook-based AI engineering workflows using Codex, Claude Code, LangChain, LangGraph, and PydanticAI through end-to-end implementations.
Built Dockerized tutorial environments and contributed to AgenticEDA to improve reproducibility, setup consistency, and onboarding.

PACCAR - Global Quality Advanced Analytics

May 2025 – Aug 2025

// Data Scientist Intern · Lewisville, Texas

Built and deployed an NLP pipeline on AWS EC2 to classify 15M+ warranty summaries in recurring batch production workflows.
Reduced manual claim-review effort by 1 hour/day by automating warranty claim triage and prioritization.
Engineered text features using NLTK / spaCy, TF-IDF, and embeddings; benchmarked ensemble models with Random Forest achieving 0.80 weighted F1.
Added MLflow, Airflow, and SQL validation checks for classification outputs, improving reliability before stakeholder review.
Applied LDA topic modeling and SHAP analysis to surface recurring issue themes and improve root-cause understanding.

15M+ records classified F1: 0.80 1 hr/day saved

Mu Sigma Inc.

Oct 2021 – Jun 2024

// Data Scientist · Replenishment Optimization & Operations Analytics · Fortune 100 Client

Built Databricks, PySpark, and Airflow-based replenishment pipelines using ARIMA, SARIMA, Prophet across 30K+ SKUs, 4 warehouses, and daily batch workflows.
Improved online fill rate from 20% to 40% over 18 months in Spain, Portugal, and France by forecasting near-term demand and optimizing allocation decisions.
Deployed a CatBoost model on AWS SageMaker for assortment optimization, reducing split shipments by 5%.
Built a Streamlit decision-support tool used daily by stakeholders, improving STO recommendation accuracy by 25%.
Refactored warehouse layers with dbt-style modular SQL; reduced query runtime from 5 minutes to 1 minute across 10 Tableau dashboards.
Implemented Jenkins CI/CD, model monitoring, and drift detection to support reliable MLOps workflows.

Fill rate +20pp 30K+ SKUs 5x faster queries Split shipments -5%

Tata Consultancy Services

Sep 2020 – Sep 2021

// Data Analyst · India

Migrated reporting workflows to Python and SQL, tuning queries to improve reporting performance, accessibility, and user experience.

High-Impact Work

Personal Projects

A/B Testing & Causal Inference

Python-based framework for experiment analysis using Difference-in-Differences, fixed effects regression, and covariate adjustment to rigorously estimate treatment effects and validate intervention impact.

PythonDiDFixed EffectsHypothesis Testing

View on GitHub →

GenAI Applications Suite

Collection of GenAI apps spanning RAG document QA, agentic chatbots, F1 analytics, and summarization — built end-to-end with LangChain, LangGraph, ChromaDB, FastAPI, Docker, and Streamlit.

LangChainLangGraphChromaDBFastAPIDockerStreamlit

View on GitHub →

F1 Commentary AI

Real-time F1 race commentary generator leveraging LLM APIs and live race data, built as a Bit Camp 2025 project with an end-to-end Python backend.

PythonLLM APIsFastAPI

View on GitHub →

Causify AI Tutorials

Contributed to open-source AI engineering tutorial system with 74+ stars. Built Dockerized environments and notebook workflows for reproducible ML experimentation and onboarding.

DockerJupyterLangGraphPydanticAI

View on GitHub →

Aayush Verma.

Full-Stack
ML Capabilities

Work Experience

Personal Projects

Education

Open to
new roles

Aayush Verma.

Full-StackML Capabilities

Work Experience

Personal Projects

Education

Open tonew roles

Full-Stack
ML Capabilities

Open to
new roles