A high-throughput and memory-efficient inference and serving engine for LLMs
☆39Aug 30, 2025Updated 8 months ago
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- 2SSP: A Two-Stage Framework for Structured Pruning of LLMs☆21Aug 18, 2025Updated 8 months ago
- LLMTechSite, 专注于通用人工智能领域的技术生态。☆12Jan 23, 2026Updated 3 months ago
- Pytorch implementation of "Very Deep Graph Neural Networks via Noise Regularisation"☆10Aug 22, 2021Updated 4 years ago
- Fixed version of https://github.com/tomguluson92/PRNet_PyTorch☆10Mar 30, 2020Updated 6 years ago
- Gradually Updated Neural Networks for Large-Scale Image Recognition at ICML 2018☆10Jun 25, 2018Updated 7 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- teler Caddy integrates the powerful security features of teler WAF into the Caddy web server, ensuring your web servers remain secure and…☆17Feb 24, 2025Updated last year
- Gaze decomposition for appearance-based gaze estimation☆12Mar 15, 2020Updated 6 years ago
- This is the official Python implementation repository for a paper entitled "Resolving Camera Position for a Practical Application of Gaz…☆12Jan 11, 2022Updated 4 years ago
- my dockerfiles☆13Mar 29, 2026Updated last month
- A retrieval augmented sequence modeling toolkit implemented based on Fairseq☆29Mar 3, 2023Updated 3 years ago
- 支持Taiyi-Diffusion-XL模型的Fooocus☆20Apr 27, 2024Updated 2 years ago
- ☆20Feb 13, 2026Updated 2 months ago
- GazeML的模型导出☆12May 21, 2020Updated 5 years ago
- React app for inspecting, building and debugging with the Realtime API☆11Nov 5, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- deduplication☆15Feb 20, 2023Updated 3 years ago
- ThinK: Thinner Key Cache by Query-Driven Pruning☆29Feb 11, 2025Updated last year
- (AAAI24 oral) Implementation of RPPO(Risk-sensitive PPO) and RPBT(Population-based self-play with RPPO)☆12May 22, 2023Updated 2 years ago
- ☆33May 26, 2024Updated last year
- vad algorithm based on esp32 for mute detection☆13Dec 9, 2018Updated 7 years ago
- This is a template for building Flutter applications for Android, which includes basic dynamic themes, theme settings, language settings …☆11Jun 17, 2025Updated 10 months ago
- Waf based on caddy2☆20Jul 21, 2022Updated 3 years ago
- ☆17May 5, 2024Updated 2 years ago
- Domain-specific framework for performance analysis of parallel programs☆25Mar 23, 2026Updated last month
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- rdiv!(::AbstractMatrix, ::UpperTriangular) and ldiv!(::LowerTriangular, ::AbstractMatrix)☆12Nov 18, 2024Updated last year
- A controlled benchmark on evaluating and studying the dynamics of Long Context Language Models☆26Oct 17, 2025Updated 6 months ago
- Accelerating Long Context LLM Inference with Accuracy-Preserving Context Optimization in SGLang, vLLM, llama.cpp, OpenClaw, RAG, and Agen…☆82Updated this week
- ☆75Mar 26, 2025Updated last year
- Implementation of "Audio Retrieval with Natural Language Queries", INTERSPEECH 2021, PyTorch☆26Aug 18, 2023Updated 2 years ago
- Julia implementation of flash-attention operation for neural networks.☆11May 31, 2023Updated 2 years ago
- An LLM inference engine, written in C++☆19Mar 30, 2026Updated last month
- ☆79Dec 15, 2023Updated 2 years ago
- Automatic differentiation of FEniCS and Firedrake models in Julia☆14Mar 21, 2021Updated 5 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [EMNLP 2025] CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward☆68Aug 10, 2025Updated 9 months ago
- Sparse symmetric indefinite solver implemented with a runtime system☆13May 11, 2020Updated 5 years ago
- 新词发现分布式机器学习算法。☆15Jul 21, 2014Updated 11 years ago
- FTRL-Proximal Online Learning Algorithm☆15May 22, 2017Updated 8 years ago
- SQL Optimizations using MLIR☆12Apr 5, 2020Updated 6 years ago
- allowing R users to work with dlib through Rcpp☆13Apr 11, 2018Updated 8 years ago
- Stanford course "Compilers" programming assignment☆13Nov 26, 2014Updated 11 years ago