ubermenchh / mini-vllmLinks
☆16Updated last month
Alternatives and similar repositories for mini-vllm
Users that are interested in mini-vllm are comparing it to the libraries listed below
Sorting:
- Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)☆28Updated 2 years ago
- Some microbenchmarks and design docs before commencement☆12Updated 5 years ago
- ☆47Updated 9 months ago
- A collection of reproducible inference engine benchmarks☆38Updated 9 months ago
- Implements a LLM similar to Meta's Llama 2 from the ground up in PyTorch, for educational purposes.☆38Updated last year
- Manages vllm-nccl dependency☆17Updated last year
- Fast and memory-efficient exact attention ported to rocm☆13Updated 2 years ago
- ☆45Updated 9 months ago
- Make triton easier☆50Updated last year
- Efficient, Flexible, and Highly Fault-Tolerant Model Service Management Based on SGLang☆61Updated last year
- Learning PyTorch through the D2L book. A series of notebooks for the same☆28Updated 3 years ago
- Live evaluation of trading agents☆95Updated 2 months ago
- Gemma2(9B), Llama3-8B-Finetune-and-RAG, code base for sample, implemented in Kaggle platform☆22Updated last year
- UC Berkeley Data 140 Textbook☆29Updated last month
- A high-performance kernel library for LLM training☆59Updated 2 weeks ago
- Benchmark suite for LLMs from Fireworks.ai☆89Updated this week
- ☆97Updated 2 weeks ago
- a curated list of the role of small models in the LLM era☆111Updated last year
- Workshop materials for AI Engineer World's Fair☆13Updated 8 months ago
- [NeurIPS 2025] Scaling Speculative Decoding with Lookahead Reasoning☆65Updated 3 months ago
- LLM Serving Performance Evaluation Harness☆83Updated 11 months ago
- A collection of lightweight interpretability scripts to understand how LLMs think☆89Updated this week
- ☆17Updated 2 years ago
- Building Recommender System with the Two-Tower Architecture☆17Updated 4 years ago
- Vocabulary Parallelism☆25Updated 11 months ago
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…☆148Updated last year
- Benchmarking Optimizers for LLM Pretraining☆49Updated last month
- Simple repository for training small reasoning models☆49Updated last year
- Supervised instruction finetuning for LLM with HF trainer and Deepspeed☆36Updated 2 years ago
- A minimal implementation of vllm.☆67Updated last year