skyzh / tiny-llmLinks
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.
☆3,288Updated 2 weeks ago
Alternatives and similar repositories for tiny-llm
Users that are interested in tiny-llm are comparing it to the libraries listed below
Sorting:
- My learning notes/codes for ML SYS.☆3,808Updated this week
- Nano vLLM☆6,999Updated last month
- ☆1,467Updated this week
- Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation☆7,919Updated 4 months ago
- Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.☆4,054Updated this week
- Textbook on reinforcement learning from human feedback☆1,251Updated last week
- Mirage Persistent Kernel: Compiling LLMs into a MegaKernel☆1,866Updated this week
- Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels☆3,197Updated this week
- Achieve the llama3 inference step-by-step, grasp the core concepts, master the process derivation, implement the code.☆609Updated 7 months ago
- A Datacenter Scale Distributed Inference Serving Framework☆5,244Updated this week
- Implementing DeepSeek R1's GRPO algorithm from scratch☆1,596Updated 5 months ago
- Supercharge Your LLM with the Fastest KV Cache Layer☆5,466Updated last week
- Student version of Assignment 1 for Stanford CS336 - Language Modeling From Scratch☆753Updated last month
- FlashInfer: Kernel Library for LLM Serving☆3,861Updated this week
- slime is an LLM post-training framework for RL Scaling.☆2,023Updated last week
- Minimalistic 4D-parallelism distributed training framework for education purpose☆1,836Updated last month
- Large Language Model (LLM) Systems Paper List☆1,523Updated last week
- An ML Systems Onboarding list☆907Updated 8 months ago
- A self-learning tutorail for CUDA High Performance Programing.☆748Updated 3 months ago
- vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization☆1,827Updated this week
- A Vector Database Tutorial (over CMU-DB's BusTub system)☆735Updated 8 months ago
- MoBA: Mixture of Block Attention for Long-Context LLMs☆1,912Updated 6 months ago
- Material for gpu-mode lectures☆5,143Updated 2 weeks ago
- Fast, Flexible and Portable Structured Generation☆1,288Updated last week
- Renderer for the harmony response format to be used with gpt-oss☆3,854Updated last month
- A reimplementation of Stable Diffusion 3.5 in pure PyTorch☆673Updated 3 months ago
- The simplest, fastest repository for training/finetuning small-sized VLMs.☆4,085Updated last month
- ☆1,434Updated 7 months ago
- 📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉☆7,874Updated 3 weeks ago
- A lightweight data processing framework built on DuckDB and 3FS.☆4,788Updated 7 months ago