skyzh / tiny-llmLinks
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.
☆2,924Updated last week
Alternatives and similar repositories for tiny-llm
Users that are interested in tiny-llm are comparing it to the libraries listed below
Sorting:
- ☆1,111Updated this week
- Supercharge Your LLM with the Fastest KV Cache Layer☆5,030Updated this week
- My learning notes/codes for ML SYS.☆3,451Updated this week
- Implementing DeepSeek R1's GRPO algorithm from scratch☆1,548Updated 4 months ago
- Nano vLLM☆6,091Updated 2 months ago
- Student version of Assignment 1 for Stanford CS336 - Language Modeling From Scratch☆594Updated this week
- Renderer for the harmony response format to be used with gpt-oss☆3,699Updated 2 weeks ago
- Mirage Persistent Kernel: Compiling LLMs into a MegaKernel☆1,731Updated last week
- Achieve the llama3 inference step-by-step, grasp the core concepts, master the process derivation, implement the code.☆607Updated 6 months ago
- Textbook on reinforcement learning from human feedback☆1,193Updated last week
- An ML Systems Onboarding list☆880Updated 7 months ago
- Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation☆7,904Updated 3 months ago
- A Datacenter Scale Distributed Inference Serving Framework☆4,841Updated this week
- Large Language Model (LLM) Systems Paper List☆1,467Updated this week
- Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O☆481Updated last week
- A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.☆2,856Updated 5 months ago
- A Vector Database Tutorial (over CMU-DB's BusTub system)☆727Updated 7 months ago
- Everything about the SmolLM and SmolVLM family of models☆3,168Updated 3 weeks ago
- Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.☆3,846Updated this week
- Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels☆1,552Updated this week
- A high-performance inference engine for AI models☆1,282Updated last week
- Minimalistic 4D-parallelism distributed training framework for education purpose☆1,701Updated last week
- vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization☆1,722Updated this week
- A reimplementation of Stable Diffusion 3.5 in pure PyTorch☆667Updated 2 months ago
- The repository has collected a batch of noteworthy MLSys bloggers (Algorithms/Systems)☆270Updated 7 months ago
- ☆1,423Updated 6 months ago
- A lightweight data processing framework built on DuckDB and 3FS.☆4,769Updated 5 months ago
- A self-learning tutorail for CUDA High Performance Programing.☆718Updated 2 months ago
- Flash Attention in ~100 lines of CUDA (forward pass only)☆915Updated 8 months ago
- slime is a LLM post-training framework aiming for RL Scaling.☆1,496Updated this week