skyzh / tiny-llmLinks
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.
☆2,863Updated this week
Alternatives and similar repositories for tiny-llm
Users that are interested in tiny-llm are comparing it to the libraries listed below
Sorting:
- Nano vLLM☆5,698Updated last month
- Supercharge Your LLM with the Fastest KV Cache Layer☆4,177Updated this week
- Implementing DeepSeek R1's GRPO algorithm from scratch☆1,517Updated 3 months ago
- ☆972Updated 3 weeks ago
- A Datacenter Scale Distributed Inference Serving Framework☆4,691Updated this week
- My learning notes/codes for ML SYS.☆3,211Updated this week
- Achieve the llama3 inference step-by-step, grasp the core concepts, master the process derivation, implement the code.☆606Updated 5 months ago
- Renderer for the harmony response format to be used with gpt-oss☆2,637Updated this week
- Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA☆1,663Updated this week
- Student version of Assignment 1 for Stanford CS336 - Language Modeling From Scratch☆499Updated 2 weeks ago
- vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization☆1,638Updated this week
- ☆1,418Updated 5 months ago
- Textbook on reinforcement learning from human feedback☆1,158Updated this week
- Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.☆3,732Updated this week
- FlashInfer: Kernel Library for LLM Serving☆3,524Updated this week
- Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation☆7,889Updated 2 months ago
- A Vector Database Tutorial (over CMU-DB's BusTub system)☆725Updated 6 months ago
- Artificial Neural Engine Machine Learning Library☆1,123Updated 3 weeks ago
- Minimalistic 4D-parallelism distributed training framework for education purpose☆1,654Updated last month
- Large Language Model (LLM) Systems Paper List☆1,420Updated last week
- An ML Systems Onboarding list☆854Updated 6 months ago
- nanoGPT style version of Llama 3.1☆1,418Updated last year
- This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025☆5,263Updated 3 months ago
- Everything about the SmolLM and SmolVLM family of models☆3,108Updated last week
- Flash Attention in ~100 lines of CUDA (forward pass only)☆903Updated 7 months ago
- A self-learning tutorail for CUDA High Performance Programing.☆697Updated last month
- Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels☆1,506Updated this week
- slime is a LLM post-training framework aiming for RL Scaling.☆1,113Updated last week
- MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining☆1,524Updated 2 months ago
- A reimplementation of Stable Diffusion 3.5 in pure PyTorch☆656Updated last month