Research Notes -- 003 entries · updated 2026.06

Notes on building intelligent systems.

Illustrated essays on large language models, ML efficiency, deep neural networks and state-space models.

Featured · Large Language Model · NLP · Inference

LLM Inference in Production: A Survey of the Serving Stack in 2026

A survey of the tools, runtimes, and serving stacks available for deploying PyTorch models: from simple classifiers to frontier LLMs.

More from the Archive Idx / 002

RAG is a Search Problem. Build It Like One.

A blueprint for retrieval systems that survive production.

RAG · Retrieval · Search
May 15, 2026
8 min

How to deploy Transformers in Production with Pytorch and Triton Inference Server

Optimization techniques for deploying PyTorch models in a production setting to achieve low latency.

Large Language Model · NLP · Triton · Pytorch
Oct 03, 2024
5 min