skyzh / tiny-llmLinks
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.
☆3,409Updated 2 weeks ago
Alternatives and similar repositories for tiny-llm
Users that are interested in tiny-llm are comparing it to the libraries listed below
Sorting:
- Nano vLLM☆8,748Updated 2 weeks ago
- Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation☆7,926Updated 6 months ago
- Implementing DeepSeek R1's GRPO algorithm from scratch☆1,670Updated 7 months ago
- ☆2,053Updated 3 weeks ago
- My learning notes/codes for ML SYS.☆4,201Updated this week
- Mirage Persistent Kernel: Compiling LLMs into a MegaKernel☆1,951Updated this week
- Supercharge Your LLM with the Fastest KV Cache Layer☆5,987Updated this week
- Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.☆4,283Updated this week
- Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels☆3,945Updated this week
- A Vector Database Tutorial (over CMU-DB's BusTub system)☆735Updated 10 months ago
- slime is an LLM post-training framework for RL Scaling.☆2,480Updated this week
- A Datacenter Scale Distributed Inference Serving Framework☆5,490Updated this week
- A reimplementation of Stable Diffusion 3.5 in pure PyTorch☆684Updated 5 months ago
- Achieve the llama3 inference step-by-step, grasp the core concepts, master the process derivation, implement the code.☆611Updated 8 months ago
- Textbook on reinforcement learning from human feedback☆1,313Updated this week
- Student version of Assignment 1 for Stanford CS336 - Language Modeling From Scratch☆910Updated 2 months ago
- FlashInfer: Kernel Library for LLM Serving☆4,099Updated this week
- Large Language Model (LLM) Systems Paper List☆1,602Updated last week
- Artificial Neural Engine Machine Learning Library☆1,252Updated this week
- Minimalistic 4D-parallelism distributed training framework for education purpose☆1,892Updated 2 months ago
- Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O☆531Updated 2 months ago
- Democratizing Reinforcement Learning for LLMs☆4,737Updated this week
- NanoGPT (124M) in 3 minutes☆3,822Updated this week
- Flash Attention in ~100 lines of CUDA (forward pass only)☆968Updated 10 months ago
- Renderer for the harmony response format to be used with gpt-oss☆4,007Updated 2 weeks ago
- Everything about the SmolLM and SmolVLM family of models☆3,408Updated 2 months ago
- A self-learning tutorail for CUDA High Performance Programing.☆767Updated 4 months ago
- Run LLMs with MLX☆2,868Updated this week
- ☆1,451Updated 9 months ago
- verl: Volcano Engine Reinforcement Learning for LLMs☆15,586Updated this week