Research Notes -- 003 entries · updated 2024.10
Notes on building intelligent systems.
Illustrated essays on large language models, ML efficiency, deep neural networks and state-space models.
Featured · Large Language Model · NLP · Triton · Pytorch
How to deploy Transformers in Production with Pytorch and Triton Inference Server
Optimization techniques for deploying PyTorch models in a production setting to achieve low latency.
More from the Archive Idx / 002
How to deploy Transformers to Production
Optimization techniques for deploying PyTorch models in a production setting and achieve low latency.
Large Language Model · NLP
Sep 21, 2024
4 min
4 min
How to Rag like a boss
A minimalist approach to always Ace Retrieval Augmented Generation problems.
RAG · code · system design
Sep 16, 2024
2 min
2 min