Available · Data Engineer & ML Engineer · College Park, MD

Aayush Verma.

Building production-grade ML pipelines, NLP systems, and GenAI applications. 3.5+ years turning messy enterprise data into stakeholder-ready decisions at scale.

3.5+
Yrs Experience
15M+
Records Classified
30K+
SKUs Optimised
4.0
MS GPA
aayush_verma.py
# Data Engineer & ML Engineer
profile = {
"name": "Aayush Verma",
"role": "Data Eng / DS / MLE",
"location": "College Park, MD",
"work_auth": "OPT / STEM OPT eligible",
"gpa": 4.0,
"open_to_work": True,
"stack": [
"PySpark", "Databricks",
"Airflow", "dbt", "Snowflake",
"LangChain", "SageMaker"
],
"impact": "15M+ records · 30K+ SKUs"
}
$ python reach_out.py

Technical Arsenal

Full-Stack
ML Capabilities

Programming & Databases
PythonNumPypandasSQLMySQLPostgreSQLMS SQL ServerNoSQLR
Machine Learning
Predictive ModelingModel ValidationFeature Engineeringscikit-learnSupervised LearningUnsupervised LearningBagging & BoostingClusteringPyTorchTensorFlow
NLP & GenAI
NLPTransformersLangChainLangGraphRAGPydanticAIChromaDBFAISSLLM EvaluationLCEL
Experimentation & Causal Inference
A/B TestingHypothesis TestingDifference-in-DifferencesGuardrail MetricsFixed Effects RegressionTreatment Effect Estimation
Data Engineering & Warehousing
PySparkAirflowDatabricksSnowflakeBigQuerydbtETL/ELTBatch InferenceSQL Validation
MLOps & Deployment
MLflowAWS SageMakerAWS EC2DockerFastAPIJenkinsCI/CDModel MonitoringDrift DetectionGit
Analytics & Visualization
TableauPower BIStreamlitKPI ReportingDashboardingData StorytellingSHAPLDA Topic ModelingTime-Series Forecasting

Career Timeline

Work Experience

CausifyAI - University of Maryland
Jan 2026 – Present
// Academic Research · College Park, MD
  • Developed notebook-based AI engineering workflows using Codex, Claude Code, LangChain, LangGraph, and PydanticAI through end-to-end implementations.
  • Built Dockerized tutorial environments and contributed to AgenticEDA to improve reproducibility, setup consistency, and onboarding.
PACCAR - Global Quality Advanced Analytics
May 2025 – Aug 2025
// Data Scientist Intern · Lewisville, Texas
  • Built and deployed an NLP pipeline on AWS EC2 to classify 15M+ warranty summaries in recurring batch production workflows.
  • Reduced manual claim-review effort by 1 hour/day by automating warranty claim triage and prioritization.
  • Engineered text features using NLTK / spaCy, TF-IDF, and embeddings; benchmarked ensemble models with Random Forest achieving 0.80 weighted F1.
  • Added MLflow, Airflow, and SQL validation checks for classification outputs, improving reliability before stakeholder review.
  • Applied LDA topic modeling and SHAP analysis to surface recurring issue themes and improve root-cause understanding.
15M+ records classified F1: 0.80 1 hr/day saved
Mu Sigma Inc.
Oct 2021 – Jun 2024
// Data Scientist · Replenishment Optimization & Operations Analytics · Fortune 100 Client
  • Built Databricks, PySpark, and Airflow-based replenishment pipelines using ARIMA, SARIMA, Prophet across 30K+ SKUs, 4 warehouses, and daily batch workflows.
  • Improved online fill rate from 20% to 40% over 18 months in Spain, Portugal, and France by forecasting near-term demand and optimizing allocation decisions.
  • Deployed a CatBoost model on AWS SageMaker for assortment optimization, reducing split shipments by 5%.
  • Built a Streamlit decision-support tool used daily by stakeholders, improving STO recommendation accuracy by 25%.
  • Refactored warehouse layers with dbt-style modular SQL; reduced query runtime from 5 minutes to 1 minute across 10 Tableau dashboards.
  • Implemented Jenkins CI/CD, model monitoring, and drift detection to support reliable MLOps workflows.
Fill rate +20pp 30K+ SKUs 5x faster queries Split shipments -5%
Tata Consultancy Services
Sep 2020 – Sep 2021
// Data Analyst · India
  • Migrated reporting workflows to Python and SQL, tuning queries to improve reporting performance, accessibility, and user experience.

High-Impact Work

Personal Projects

01
A/B Testing & Causal Inference
Python-based framework for experiment analysis using Difference-in-Differences, fixed effects regression, and covariate adjustment to rigorously estimate treatment effects and validate intervention impact.
PythonDiDFixed EffectsHypothesis Testing
View on GitHub →
02
GenAI Applications Suite
Collection of GenAI apps spanning RAG document QA, agentic chatbots, F1 analytics, and summarization — built end-to-end with LangChain, LangGraph, ChromaDB, FastAPI, Docker, and Streamlit.
LangChainLangGraphChromaDBFastAPIDockerStreamlit
View on GitHub →
03
F1 Commentary AI
Real-time F1 race commentary generator leveraging LLM APIs and live race data, built as a Bit Camp 2025 project with an end-to-end Python backend.
PythonLLM APIsFastAPI
View on GitHub →
04
Causify AI Tutorials
Contributed to open-source AI engineering tutorial system with 74+ stars. Built Dockerized environments and notebook workflows for reproducible ML experimentation and onboarding.
DockerJupyterLangGraphPydanticAI
View on GitHub →

Academic Background

Education

2026
University of Maryland, College Park
Master of Science,
Data Science
Aug 2024 – May 2026 College Park, Maryland
GPA 4.0 / 4.0
2020
Vellore Institute of Technology
B.Tech, Electronics & Communications Engineering
Specialisation: IoT & Sensors Jun 2016 – Jun 2020 · Vellore, India

Let's Connect

Open to
new roles

Data Engineer & ML Engineer with a track record of delivering measurable impact.
Based in College Park, MD — available now. Eligible for OPT and STEM OPT work authorization.

AV
Aayush
Ask me about my work