A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
☆249Feb 28, 2026Updated this week
Alternatives and similar repositories for speculators
Users that are interested in speculators are comparing it to the libraries listed below
Sorting:
- A safetensors extension to efficiently store sparse quantized tensors on disk☆256Updated this week
- vLLM adapter for a TGIS-compatible gRPC server.☆53Updated this week
- coded with and corrected by Google Anti-Gravity☆13Nov 23, 2025Updated 3 months ago
- Super simple python connectors for llama.cpp, including vision models (Gemma 3, Qwen2-VL). Compile llama.cpp and run!☆29Dec 11, 2025Updated 2 months ago
- ☆45Nov 10, 2023Updated 2 years ago
- An SSH plugin for Dify☆13Jan 16, 2026Updated last month
- 🚀 Sliding Window Attention Training for Efficient Large Language Models☆16Dec 8, 2025Updated 2 months ago
- Model Express is a Rust-based component meant to be placed next to existing model inference systems to speed up their startup times and i…☆31Updated this week
- MCP server that enables AI assistants to interact with Qwen code☆23Aug 22, 2025Updated 6 months ago
- Bagua tutorials.☆13Sep 4, 2022Updated 3 years ago
- Distributed SDDMM Kernel☆12Jul 8, 2022Updated 3 years ago
- A thin cython wrapper around llama.cpp, whisper.cpp and stable-diffusion.cpp☆16Feb 10, 2026Updated 3 weeks ago
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM☆2,787Updated this week
- 3x Faster Inference; Unofficial implementation of EAGLE Speculative Decoding☆83Jul 3, 2025Updated 8 months ago
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆403Feb 24, 2026Updated last week
- Train speculative decoding models effortlessly and port them smoothly to SGLang serving.☆716Updated this week
- Common recipes to run vLLM☆470Updated this week
- Longitudinal Evaluation of LLMs via Data Compression☆33May 29, 2024Updated last year
- llms related stuff , including code, docs☆13Feb 25, 2025Updated last year
- The Soft Cosine Measure system developed for the ARQMath-3 shared task evaluation of math information retrieval systems☆13Sep 8, 2022Updated 3 years ago
- Optimize GEMM with tensorcore step by step☆36Dec 17, 2023Updated 2 years ago
- ☆12Mar 8, 2022Updated 3 years ago
- A benchmark suite for Graph Machine Learning☆19Oct 8, 2024Updated last year
- NVIDIA Inference Xfer Library (NIXL)☆898Updated this week
- Cloud Native Benchmarking of Foundation Models☆45Jul 31, 2025Updated 7 months ago
- A benchmarking tool for comparing different LLM API providers' DeepSeek model deployments.☆30Mar 28, 2025Updated 11 months ago
- A throughput-oriented high-performance serving framework for LLMs☆947Oct 29, 2025Updated 4 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆267Dec 4, 2025Updated 3 months ago
- Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).☆2,201Feb 20, 2026Updated last week
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆42Jan 15, 2024Updated 2 years ago
- ☆11Feb 25, 2026Updated last week
- ☆17Mar 2, 2024Updated 2 years ago
- 用大模型批量处理数据,现支持各种大模型做OCR,支持通义千问, 月之暗面, 百度飞桨OCR, OpenAI 和LLAVA。Use LLM to generate or clean data for academic use. Support OCR with qwen, m…☆16Sep 15, 2024Updated last year
- Code for Robust Fine-tuning (RbFT)☆17Jan 31, 2025Updated last year
- ☆71Mar 26, 2025Updated 11 months ago
- [MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Se…☆817Mar 6, 2025Updated 11 months ago
- [NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization☆402Aug 13, 2024Updated last year
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆327Updated this week
- private-machine is an AI companion system with emotion, needs and goals simulation. Very silly, not based on real science.☆29Updated this week