A high-throughput and memory-efficient inference and serving engine for LLMs
☆39Jun 24, 2026Updated this week
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Opening tech prediction through machine learning for Starcraft AI☆10Jul 4, 2012Updated 13 years ago
- 2SSP: A Two-Stage Framework for Structured Pruning of LLMs☆21Aug 18, 2025Updated 10 months ago
- Web archiving utility library☆11Jun 19, 2026Updated last week
- ☆12Mar 21, 2024Updated 2 years ago
- ☆41Aug 30, 2019Updated 6 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆13Jul 25, 2024Updated last year
- teler Caddy integrates the powerful security features of teler WAF into the Caddy web server, ensuring your web servers remain secure and…☆17Feb 24, 2025Updated last year
- Official implementation of A cappella: Audio-visual Singing VoiceSeparation, from BMVC21☆18May 14, 2022Updated 4 years ago
- 一个简洁高效的 AI 命令行助手,支持对话、命令生成、文件处理。☆17Sep 16, 2025Updated 9 months ago
- React app for inspecting, building and debugging with the Realtime API☆11Nov 5, 2024Updated last year
- deduplication☆15Feb 20, 2023Updated 3 years ago
- tabular q learning for trading☆12Dec 10, 2018Updated 7 years ago
- An interactive companion toy that engages kids with storytelling, singing, and encouragement for physical activities using advanced AI t…☆10Oct 15, 2024Updated last year
- (AAAI24 oral) Implementation of RPPO(Risk-sensitive PPO) and RPBT(Population-based self-play with RPPO)☆12May 22, 2023Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Waf based on caddy2☆20Jul 21, 2022Updated 3 years ago
- Compact and Agent-Native MoE Training System☆209Updated this week
- Domain-specific framework for performance analysis of parallel programs☆25Mar 23, 2026Updated 3 months ago
- ☆17May 5, 2024Updated 2 years ago
- Freeswitch Speech-To-Text module☆17Mar 14, 2026Updated 3 months ago
- rdiv!(::AbstractMatrix, ::UpperTriangular) and ldiv!(::LowerTriangular, ::AbstractMatrix)☆11Nov 18, 2024Updated last year
- ☆75Mar 26, 2025Updated last year
- Implementation of "Audio Retrieval with Natural Language Queries", INTERSPEECH 2021, PyTorch☆26Aug 18, 2023Updated 2 years ago
- Socat一键安装脚本,可转发TCP和UDP流量,支持IPv4和IPv6☆14Jul 25, 2025Updated 11 months ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Julia implementation of flash-attention operation for neural networks.☆11May 31, 2023Updated 3 years ago
- Sparse Attention with Linear Units☆20Apr 21, 2021Updated 5 years ago
- KVarN is a native vLLM KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. …☆412Jun 16, 2026Updated last week
- An LLM inference engine, written in C++☆20Mar 30, 2026Updated 2 months ago
- ☆79Dec 15, 2023Updated 2 years ago
- ☆15Jul 11, 2023Updated 2 years ago
- A dynamic GPU memory allocator, suitable for warp synchronized scenarios.☆11Aug 20, 2019Updated 6 years ago
- ocr照片识别文字,包括裁剪图片,能识别中文和英文,是现有网上资源中识别率最好的☆14Sep 20, 2016Updated 9 years ago
- a ros node using face_net do face_recognition☆12Jul 27, 2016Updated 9 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Automatic differentiation of FEniCS and Firedrake models in Julia☆14Mar 21, 2021Updated 5 years ago
- SQL Optimizations using MLIR☆12Apr 5, 2020Updated 6 years ago
- FTRL-Proximal Online Learning Algorithm☆15May 22, 2017Updated 9 years ago
- Inspired by Alpha Arena and open-nof1.ai,we want to explore the new trading way of ai-trading.we will improve the LLMs and use machine le…☆39Dec 16, 2025Updated 6 months ago
- Bagua tutorials.☆13Sep 4, 2022Updated 3 years ago
- An MPI wrapper for the pytorch tensor library that is automatically differentiable☆10Mar 27, 2023Updated 3 years ago
- Generates egress bills to whose using S3 bucket to serve BLOBs☆23Sep 9, 2024Updated last year