Wenyueh / MinivLLMLinks
Based on Nano-vLLM, a simple replication of vLLM with self-contained paged attention and flash attention implementation
☆191Updated this week
Alternatives and similar repositories for MinivLLM
Users that are interested in MinivLLM are comparing it to the libraries listed below
Sorting:
- ☆467Updated 4 months ago
- dInfer: An Efficient Inference Framework for Diffusion Language Models☆396Updated 2 weeks ago
- mHC kernels implemented in CUDA☆217Updated last week
- Block Diffusion for Ultra-Fast Speculative Decoding☆349Updated 2 weeks ago
- Miles is an enterprise-facing reinforcement learning framework for large-scale MoE post-training and production workloads, forked from an…☆744Updated this week
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models☆227Updated 2 months ago
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.☆331Updated 2 months ago
- An early research stage expert-parallel load balancer for MoE models based on linear programming.☆491Updated 2 months ago
- A collection of tricks and tools to speed up transformer models☆194Updated last month
- Efficient LLM Inference over Long Sequences☆393Updated 6 months ago
- Checkpoint-engine is a simple middleware to update model weights in LLM inference engines☆890Updated this week
- Student version of Assignment 2 for Stanford CS336 - Language Modeling From Scratch☆159Updated 5 months ago
- QeRL enables RL for 32B LLMs on a single H100 GPU.☆474Updated last month
- Memory optimized Mixture of Experts☆72Updated 5 months ago
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆200Updated last month
- Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling☆467Updated 8 months ago
- fmchisel: Efficient Compression and Training Algorithms for Foundation Models☆81Updated 2 months ago
- Flexible and Pluggable Serving Engine for Diffusion LLMs☆51Updated 2 weeks ago
- Tina: Tiny Reasoning Models via LoRA☆314Updated 3 months ago
- Speed Always Wins: A Survey on Efficient Architectures for Large Language Models☆389Updated 2 months ago
- An extension of the nanoGPT repository for training small MOE models.☆226Updated 10 months ago
- GPU-optimized framework for training diffusion language models at any scale. The backend of Quokka, Super Data Learners, and OpenMoE 2 tr…☆318Updated 2 months ago
- Physics of Language Models, Part 4☆306Updated 2 weeks ago
- Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support☆250Updated this week
- minimal GRPO implementation from scratch☆102Updated 10 months ago
- [NeurIPS 2025] Scaling Speculative Decoding with Lookahead Reasoning☆61Updated 2 months ago
- ☆99Updated 6 months ago
- ☆224Updated last month
- ☆233Updated last year
- ☆132Updated 7 months ago