Wenyueh / MinivLLMLinks
Based on Nano-vLLM, a simple replication of vLLM with self-contained paged attention and flash attention implementation
☆103Updated this week
Alternatives and similar repositories for MinivLLM
Users that are interested in MinivLLM are comparing it to the libraries listed below
Sorting:
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.☆327Updated last month
- Simple & Scalable Pretraining for Neural Architecture Research☆305Updated 3 weeks ago
- OpenTinker is an RL-as-a-Service infrastructure for foundation models☆424Updated this week
- ☆466Updated 4 months ago
- rl from zero pretrain, can it be done? yes.☆282Updated 3 months ago
- Exploring Applications of GRPO☆251Updated 4 months ago
- in this repository, i'm going to implement increasingly complex llm inference optimizations☆75Updated 7 months ago
- ☆84Updated 2 weeks ago
- ☆629Updated last week
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 9 months ago
- QeRL enables RL for 32B LLMs on a single H100 GPU.☆470Updated last month
- Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"☆573Updated 2 months ago
- The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…☆122Updated 2 months ago
- dInfer: An Efficient Inference Framework for Diffusion Language Models☆373Updated last week
- An extension of the nanoGPT repository for training small MOE models.☆222Updated 9 months ago
- Quantized LLM training in pure CUDA/C++.☆226Updated this week
- Checkpoint-engine is a simple middleware to update model weights in LLM inference engines☆880Updated last week
- CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning☆277Updated last month
- An early research stage expert-parallel load balancer for MoE models based on linear programming.☆476Updated last month
- A collection of tricks and tools to speed up transformer models☆194Updated 2 weeks ago
- ☆45Updated 7 months ago
- Memory optimized Mixture of Experts☆72Updated 5 months ago
- NanoGPT-speedrunning for the poor T4 enjoyers☆73Updated 8 months ago
- ☆225Updated last month
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models☆227Updated last month
- GPU documentation for humans☆430Updated 3 weeks ago
- Tina: Tiny Reasoning Models via LoRA☆310Updated 3 months ago
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆109Updated 9 months ago
- A collection of lightweight interpretability scripts to understand how LLMs think☆74Updated this week
- ☆66Updated 9 months ago