skyzh / tiny-llmLinks
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.
☆3,466Updated last month
Alternatives and similar repositories for tiny-llm
Users that are interested in tiny-llm are comparing it to the libraries listed below
Sorting:
- Nano vLLM☆9,459Updated last month
- Mirage Persistent Kernel: Compiling LLMs into a MegaKernel☆1,981Updated this week
- My learning notes/codes for ML SYS.☆4,374Updated this week
- Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels☆4,113Updated this week
- A Datacenter Scale Distributed Inference Serving Framework☆5,617Updated this week
- Supercharge Your LLM with the Fastest KV Cache Layer☆6,277Updated this week
- ☆2,193Updated last week
- Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.☆4,357Updated last week
- FlashInfer: Kernel Library for LLM Serving☆4,168Updated last week
- Static suckless single batch CUDA-only qwen3-0.6B mini inference engine☆519Updated 3 months ago
- Implementing DeepSeek R1's GRPO algorithm from scratch☆1,687Updated 7 months ago
- A Vector Database Tutorial (over CMU-DB's BusTub system)☆738Updated 10 months ago
- Achieve the llama3 inference step-by-step, grasp the core concepts, master the process derivation, implement the code.☆612Updated 9 months ago
- Textbook on reinforcement learning from human feedback☆1,344Updated last week
- Large Language Model (LLM) Systems Paper List☆1,655Updated 2 weeks ago
- Renderer for the harmony response format to be used with gpt-oss☆4,050Updated last month
- Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation☆7,937Updated 6 months ago
- Puzzles for learning Triton☆2,153Updated last year
- Minimalistic 4D-parallelism distributed training framework for education purpose☆1,911Updated 3 months ago
- Tile primitives for speedy kernels☆2,980Updated this week
- vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization☆2,018Updated last week
- Material for gpu-mode lectures☆5,384Updated 2 weeks ago
- Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O☆537Updated 2 months ago
- A reimplementation of Stable Diffusion 3.5 in pure PyTorch☆687Updated 5 months ago
- An ML Systems Onboarding list☆945Updated 10 months ago
- Run LLMs with MLX☆3,003Updated this week
- A high-performance inference engine for AI models☆1,384Updated this week
- 📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉☆8,809Updated last week
- Student version of Assignment 1 for Stanford CS336 - Language Modeling From Scratch☆976Updated 3 months ago
- ☆1,451Updated 9 months ago