Self-paced course
LLM Inference & Serving
How large language models actually run in production — KV-cache management, batching, speculative decoding, quantization, and the systems work that turns a model into a serveable endpoint.
0 / 4 lessons complete
Curriculum
Prefer it in your inbox?
Get this track as a free daily email course — one short lesson a day.