Writing
Blog
Technical writing on distributed systems and AI engineering — production LLM infrastructure, agent observability, RAG, and system design from Ashwani Jha.
What Every Backend Engineer Should Know About Attention
RNNs forced you to wait for token 100 before processing token 101. Transformers parallelize the whole sequence. Here's why that matters for production systems.
Teaching LLMs to reach for a calculator
A 6.7B model that knows when to call a calculator beats GPT-3 175B on math. Toolformer's self-supervised approach to tool use is worth understanding before you hardcode a tool-calling chain.
Self-correcting agents in production
The labeling bottleneck is real. Constitutional AI teaches agents to critique themselves using principles — fewer human labels, faster iteration, new tradeoffs.
LoRA Fine-Tuning: What the Microsoft Paper Actually Says
LoRA cuts trainable parameters by 10,000x and matches full fine-tuning performance. Here's what Microsoft's paper actually says about low-rank adaptation.
Scaling laws are not just about research budgets
Loss follows a power law across seven orders of magnitude of compute. Kaplan et al.'s scaling laws are a decision framework — not just research trivia.
RAG in Production: Warnings from the Original Paper
RAG is in every AI pitch deck. Most skip the paper's failure modes: retrieval collapse, frozen encoders, approximate MIPS. Lewis et al. built something subtler.
MapReduce: Google's 2004 Paper and Your 2026 Decisions
Dean & Ghemawat's 2004 MapReduce decisions — stragglers, data locality, combiner functions — are the ones you still make in Spark and Flink today.
How LLMs learn to use tools
Rule-based tool routing is brittle. Supervised annotation is expensive. Toolformer shows a third path — let the model decide where tools help, filter on loss, and fine-tune. The numbers are worth understanding before you build your next agent.
How I think about agent memory
Most LLM agents are amnesiacs. The fix isn't a bigger context window — it's a memory system with four layers, explainable retrieval, and a feedback loop.
In-Context Learning Is Not Magic: What GPT-3 Actually Shows
The GPT-3 paper is cited constantly and read rarely. It documents failure modes, a data contamination bug, and benchmark gaps that matter in production.
Dynamo's Tradeoffs: What Amazon's 2007 Paper Still Teaches
Amazon's 2007 Dynamo paper defined the tradeoffs every distributed storage system still makes. Eventual consistency, conflict resolution, and availability over correctness.
Constitutional AI: What Anthropic's Paper Actually Says
RLHF for harmlessness requires labeling harmful outputs at scale. CAI replaces that with a model critiquing itself against a written constitution.
BERT and the Fine-Tuning Paradigm: What the Paper Built
The 2018 BERT paper defined the fine-tuning paradigm behind every embedding model and text classifier you use today. Here's what Devlin et al. actually built.
How I think about agent observability
Traditional APM was built for web requests, not agents that loop, retry, branch, and spend dollars per call. Here's what agent observability actually needs.
The AI agent observability stack
Agent observability is five different problems. Different tools solve different ones — traces, hallucinations, cost, drift. Here's a map and where to start.
Building a Brain: Cognitive Architecture for AI
Most AI assistants are stateless, waking up blank every session. I built Friday a brain — four-layer memory, BDI runtime, and adaptive learning.