skyzh / tiny-llmLinks
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.
☆3,721Updated last month
Alternatives and similar repositories for tiny-llm
Users that are interested in tiny-llm are comparing it to the libraries listed below
Sorting:
- A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.☆3,067Updated this week
- My learning notes for ML SYS.☆5,077Updated last week
- ☆2,457Updated 2 weeks ago
- Nano vLLM☆10,892Updated 2 months ago
- Student version of Assignment 1 for Stanford CS336 - Language Modeling From Scratch☆1,138Updated 4 months ago
- Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels☆4,739Updated last week
- Implementing DeepSeek R1's GRPO algorithm from scratch☆1,747Updated 9 months ago
- Large Language Model (LLM) Systems Paper List☆1,765Updated last week
- Supercharge Your LLM with the Fastest KV Cache Layer☆6,750Updated this week
- slime is an LLM post-training framework for RL Scaling.☆3,466Updated this week
- Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.☆4,600Updated this week
- Textbook on reinforcement learning from human feedback☆1,426Updated this week
- Mirage Persistent Kernel: Compiling LLMs into a MegaKernel☆2,084Updated 2 weeks ago
- A Datacenter Scale Distributed Inference Serving Framework☆5,793Updated last week
- Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation☆7,957Updated 8 months ago
- Achieve the llama3 inference step-by-step, grasp the core concepts, master the process derivation, implement the code.☆618Updated 11 months ago
- FlashInfer: Kernel Library for LLM Serving☆4,707Updated this week
- 📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉☆9,382Updated 2 weeks ago
- An ML Systems Onboarding list☆972Updated last year
- Minimalistic 4D-parallelism distributed training framework for education purpose☆1,991Updated 4 months ago
- 📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉☆4,936Updated last week
- Static suckless single batch CUDA-only qwen3-0.6B mini inference engine☆539Updated 4 months ago
- Renderer for the harmony response format to be used with gpt-oss☆4,146Updated last month
- vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization☆2,108Updated last week
- Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O☆550Updated 4 months ago
- A self-learning tutorail for CUDA High Performance Programing.☆854Updated last week
- A Vector Database Tutorial (over CMU-DB's BusTub system)☆742Updated last year
- Democratizing Reinforcement Learning for LLMs☆4,995Updated last week
- Anthropic's original performance take-home, now open for you to try!☆2,313Updated this week
- cuTile is a programming model for writing parallel kernels for NVIDIA GPUs☆1,851Updated this week