Sushant Shambharkar

Building AI Infrastructure,
Data Platforms &
Production ML Systems

10+ years building distributed systems, real-time data platforms, machine learning infrastructure, and AI applications across finance, cloud, analytics, and AdTech.

From streaming pipelines and feature stores to RAG platforms and multi-agent systems — I focus on systems that survive production realities: scale, latency, reliability, observability, and business impact.

View Resume Explore Architectures Contact

GoogleSenior Software Engineer

Goldman SachsSenior Software Engineer

Angel OneTech Lead — AI Labs

AmagiTech Lead — AdTech

SlintelLead Software Engineer

PersistentSoftware Engineer

Career & Impact

10+

Years Building Production Systems

IIT Bombay

M.Tech Computer Science

99.51

GATE Percentile

30+

Engineers Mentored

100+

Technical Interviews

Business Outcomes

35%

Improvement in customer engagement

Angel One

60%

Reduction in advisor response latency

Angel One

30%

Increase in ad CTR

Amagi

20%

Increase in viewer retention

Amagi

200%

Increase in dashboard creation efficiency

Slintel

Faster dashboard performance

Slintel

25%

Reduction in production failures

Google

10x

Improvement in processing performance

Persistent

About

I build systems at the intersection of data engineering and AI engineering. Over the last decade I have worked on streaming platforms, ML infrastructure, real-time decision systems, large-scale NLP systems, retrieval architectures, and production AI applications.

I care less about demos and benchmarks and more about reliability, observability, scalability, latency, and measurable business outcomes. The hardest problems in production AI are data problems — quality, drift, lineage, freshness. I treat AI pipelines like data pipelines: schema-enforced, tested, monitored.

Currently I lead AI engineering at Angel One, building agentic platforms and ML infrastructure that process millions of decisions daily. Previously I built B2B intent intelligence at Slintel, real-time ad optimization at Amagi, experimentation infrastructure at Google Cloud, and NLP platforms at Goldman Sachs.

Core Expertise

AI Systems

RAG Pipelines
Agentic AI
LLMOps
AI Evaluation
Prompt Engineering

Data Platforms

Apache Kafka
Apache Flink
Delta Lake
Snowflake
Databricks

Cloud & Infrastructure

Kubernetes
Docker
Terraform
AWS
GCP

Engineering

Python
FastAPI
Distributed Systems
System Design
MLOps

How I Think About Engineering

Reliability Before Novelty

Production systems must survive node failures, data skew, latency spikes, and bad deployments before they earn the right to use the latest model architecture.

AI Systems Are Data Systems

The hardest problems in production AI are data problems — quality, drift, lineage, freshness. Treat AI pipelines like data pipelines: schema-enforced, tested, monitored.

Latency Is A Feature

Every millisecond added to an AI pipeline is a tax on user experience. Design for sub-50ms p99 inference from day one. Caching, batching, and quantization are not afterthoughts.

Observability First

If you cannot measure it, you cannot debug it. Every system I build ships with structured logging, distributed tracing, and real-time metrics dashboards before the first user hits it.

Measure Before Optimizing

Benchmarks and intuition are not substitutes for production measurements. Instrument, collect baselines, then optimize the bottlenecks the data reveals.

Automation Over Repetition

Manual deployments, manual evaluation, manual rollbacks — every manual step is a risk. CI/CD for ML, automated guardrails, and self-healing infrastructure are table stakes.

Case Studies

Production AI systems I designed, built, and shipped.

Real-Time Ad Decisioning Platform

AI-powered real-time ad optimization system for streaming media, processing millions of ad decisions per second.

PythonTensorFlowKubernetesRedis

Context & Problem

Traditional ad insertion systems couldn't optimize for viewer engagement in real-time, leading to poor ad performance and viewer churn.

Approach & Architecture

Built an ML-powered decision engine that analyzes viewer behavior, content context, and advertiser goals to optimize ad placement in real-time.

Lessons Learned

Latency is everything in ad tech - decisions must be made in <50ms
A/B testing at scale requires careful statistical rigor
Feature engineering for real-time systems is fundamentally different from batch

Read case study

B2B Intent Intelligence Platform

NLP-powered system for analyzing customer behavior and generating actionable intelligence for B2B sales teams.

PythonTransformersElasticsearchFastAPI

Context & Problem

Sales teams spent hours manually analyzing customer interactions to identify opportunities, missing critical signals in the noise.

Approach & Architecture

Developed a transformer-based system that automatically extracts intent signals, sentiment, and opportunity scores from customer communications.

Lessons Learned

Context window limitations require careful text chunking strategies
Domain-specific fine-tuning outperforms generic models by 30%+
Explainability is crucial for user trust in AI recommendations

Read case study Source

View all case studies

Leadership & Community

30+ Engineers Mentored

Guided junior and mid-level engineers across multiple organizations

100+ Technical Interviews

Conducted interviews for engineering roles across Google, Goldman Sachs, and startups

Speaker on AI & NLP

Frequent speaker and mentor on GenAI, NLP, and distributed ML systems

Open Source Contributor

Published LangGraph templates and built autonomous AI agent infrastructure

Open Source

OpenCode

Active

AI coding agent with MCP server architecture. Multi-agent orchestration, tool execution, and autonomous code generation.

GoMCPAgentic AI

GitHub

Dotfiles & Homelab

Active

Personal infrastructure-as-code: Kubernetes homelab, CI/CD pipelines, monitoring stack, and self-hosted AI workloads.

KubernetesTerraformAnsibleDocker

GitHub

LangGraph Templates

Published

Open-source templates for enterprise-grade GenAI agents using LangGraph patterns.

PythonLangGraphLLM

GitHub

Featured Writing

Jan 10, 20252 min read

Building Production RAG Systems: Lessons from the Trenches

A deep dive into building retrieval-augmented generation systems that actually work at scale, covering architecture decisions, embedding strategies, and production pitfalls.

#RAG#LLM#Production ML

Read article

Jan 20252 min read

Agentic AI: From Research to Production

How to take agentic AI systems from research prototypes to production deployments with proper reliability, monitoring, and observability.

Read

Read all articles

Current Focus

Building

OpenCode AI coding agent with MCP server architecture
Multi-agent orchestration framework for production workflows
Personal AI knowledge base with RAG pipelines

Learning

Multi-agent orchestration frameworks
Advanced prompt engineering & LLM fine-tuning
Kubernetes operators & custom controllers

Reading

Building LLM Applications with Prompt Engineering
Designing Machine Learning Systems (Chip Huyen)
Reinforcement Learning: An Introduction

Activity

June 2026

OpenCode AI AgentJun 15

sushantdev.com v2 — Living PlatformJun 14

Building Production RAG PipelinesJun 10

Multi-Agent OrchestrationJun 8

Survey of Agent FrameworksJun 5

View full feed

Available for technical leadership

Looking for an AI infrastructure lead who can architect and deliver production systems?

View Resume Send Email

Building AI Infrastructure,Data Platforms &Production ML Systems

Career & Impact

Business Outcomes

About

Core Expertise

AI Systems

Data Platforms

Cloud & Infrastructure

Engineering

How I Think About Engineering

Reliability Before Novelty

AI Systems Are Data Systems

Latency Is A Feature

Observability First

Measure Before Optimizing

Automation Over Repetition

Case Studies

Real-Time Ad Decisioning Platform

Context & Problem

Approach & Architecture

Lessons Learned

B2B Intent Intelligence Platform

Context & Problem

Approach & Architecture

Lessons Learned

Leadership & Community

30+ Engineers Mentored

100+ Technical Interviews

Speaker on AI & NLP

Open Source Contributor

Open Source

OpenCode

Dotfiles & Homelab

LangGraph Templates

Featured Writing

Building Production RAG Systems: Lessons from the Trenches

Agentic AI: From Research to Production

Current Focus

Building

Learning

Reading

Activity

Available for technical leadership

Building AI Infrastructure,
Data Platforms &
Production ML Systems