skyzh / tiny-llmLinks
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.
☆3,774Updated last month
Alternatives and similar repositories for tiny-llm
Users that are interested in tiny-llm are comparing it to the libraries listed below
Sorting:
- A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.☆3,297Updated 2 weeks ago
- My learning notes for ML SYS.☆5,242Updated last week
- ☆2,544Updated 3 weeks ago
- Nano vLLM☆11,410Updated 3 months ago
- Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.☆4,677Updated this week
- Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels☆5,094Updated this week
- A Datacenter Scale Distributed Inference Serving Framework☆6,052Updated this week
- An ML Systems Onboarding list☆981Updated last year
- slime is an LLM post-training framework for RL Scaling.☆3,668Updated this week
- Supercharge Your LLM with the Fastest KV Cache Layer☆6,839Updated this week
- Textbook on reinforcement learning from human feedback☆1,560Updated this week
- Implementing DeepSeek R1's GRPO algorithm from scratch☆1,759Updated 9 months ago
- Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation☆7,964Updated 8 months ago
- FlashInfer: Kernel Library for LLM Serving☆4,853Updated this week
- Mirage Persistent Kernel: Compiling LLMs into a MegaKernel☆2,120Updated last week
- Minimalistic 4D-parallelism distributed training framework for education purpose☆2,058Updated 5 months ago
- Material for gpu-mode lectures☆5,679Updated last week
- Student version of Assignment 1 for Stanford CS336 - Language Modeling From Scratch☆1,201Updated 5 months ago
- Achieve the llama3 inference step-by-step, grasp the core concepts, master the process derivation, implement the code.☆623Updated 11 months ago
- Large Language Model (LLM) Systems Paper List☆1,802Updated last week
- ☆1,462Updated 11 months ago
- Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O☆550Updated 4 months ago
- A Vector Database Tutorial (over CMU-DB's BusTub system)☆746Updated last year
- Renderer for the harmony response format to be used with gpt-oss☆4,171Updated last month
- Puzzles for learning Triton☆2,283Updated last year
- NanoGPT (124M) in 2 minutes☆4,589Updated last week
- A self-learning tutorail for CUDA High Performance Programing.☆882Updated 3 weeks ago
- Based on Nano-vLLM, a simple replication of vLLM with self-contained paged attention and flash attention implementation☆353Updated last week
- Tile primitives for speedy kernels☆3,120Updated this week
- A reimplementation of Stable Diffusion 3.5 in pure PyTorch☆691Updated 7 months ago