A high-throughput and memory-efficient inference and serving engine for LLMs
☆13Jun 5, 2026Updated this week
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- LLM KV Cache compression - K+V dual compression, 73-99% VRAM savings, zero accuracy loss☆57Mar 30, 2026Updated 2 months ago
- The driver for LMCache core to run in vLLM☆66Feb 4, 2025Updated last year
- Elaina is a wavefront implementation of walk on stars. (Code for SIGGRAPH 2025 paper "Guiding-Based Importance Sampling for Walk on Stars…☆28Oct 7, 2025Updated 8 months ago
- Java 8 Streams C++ port☆15May 9, 2022Updated 4 years ago
- This is my Master thesis which evaluates 6D pose estimating deep learning methods for usage in an AR use case. It includes 2 new proxies …☆17Feb 7, 2020Updated 6 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Horizontal Fusion☆24Jan 7, 2022Updated 4 years ago
- 华为集合通信性能测试☆17May 27, 2024Updated 2 years ago
- FlashSampling: Fast and Memory-Efficient Exact Sampling (https://huggingface.co/papers/2603.15854)☆73May 13, 2026Updated 3 weeks ago
- Pytorch--使用伪标签训练efficientNet模型☆11Dec 28, 2019Updated 6 years ago
- Metis: Understanding and Enhancing Regular Expressions in Network☆14Aug 19, 2022Updated 3 years ago
- ☆13Sep 7, 2024Updated last year
- Expert Kit is an efficient foundation of Expert Parallelism (EP) for MoE model Inference on heterogenous hardware☆64Jun 2, 2026Updated last week
- Cluster management tools for the Hydro stack☆19Feb 5, 2021Updated 5 years ago
- The audio player for Flutter with a heart of gold☆13May 13, 2023Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- See vLLM official support: https://github.com/vllm-project/vllm-ascend☆11Feb 5, 2025Updated last year
- Source and solution codes for Professional CUDA C Programming book.☆15Aug 20, 2020Updated 5 years ago
- (Obsoleted) A speech signal processing library designed for CVE of Rocaloid Project.☆15Dec 7, 2013Updated 12 years ago
- Source Code for Partial Interference☆10Dec 17, 2022Updated 3 years ago
- A Distributed Analysis and Benchmarking Framework for Apache OpenWhisk Serverless Platform☆12Dec 11, 2018Updated 7 years ago
- ☆18Feb 18, 2026Updated 3 months ago
- The code for paper 'Hierarchical Policy for Non-prehensile Multi-object Rearrangement with Deep Reinforcement Learning and Monte Carlo Tr…☆21Aug 18, 2023Updated 2 years ago
- OmniMCP uses Microsoft OmniParser and Model Context Protocol (MCP) to provide AI models with rich UI context and powerful interaction cap…☆72Apr 8, 2025Updated last year
- ove2xml is a handy, easy to use application specially designed to help you convert music notation software Overture 's document to MusicX…☆13Oct 12, 2015Updated 10 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆10Updated this week
- Robotic platform for industrial control systems cybersecurity research. We use the research-grade Youbot as the robotics platform for ou…☆27Aug 6, 2015Updated 10 years ago
- Repository for implementation of active learning and semi-supervised learning algorithms and applying them to medical imaging datasets☆16May 17, 2021Updated 5 years ago
- Code for undergraduate thesis "Active Learning for Deep Object Detection".☆14Nov 12, 2023Updated 2 years ago
- EleutherAI ML Performance reading group repository (slides, meeting recordings, annotated papers)☆32Mar 20, 2026Updated 2 months ago
- Zoom in Lesions for Better Diagnosis: Attention Guided Deformation Network for WCE Image Classification☆13Aug 4, 2020Updated 5 years ago
- Mathematical expression evaluator with just in time code generation.☆12Apr 7, 2013Updated 13 years ago
- A simple LaTeX template for CUHK thesis.☆17Apr 24, 2023Updated 3 years ago
- [已弃用] QChatGPT 项目的同类模型切换器插件☆21Aug 13, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Distributed SDDMM Kernel☆12Jul 8, 2022Updated 3 years ago
- Crawl phone information from Taobao and JD, clean those raw data. Use those data to analyze and compare prices of different phone models.…☆12Apr 15, 2020Updated 6 years ago
- Main repository of the BeFaaS project☆15Jun 29, 2023Updated 2 years ago
- Official implementation of the papers "User-controlled federated matrix factorization for recommender systems" and "FedeRank: User Contro…☆18Jul 28, 2020Updated 5 years ago
- Papers related to the Recommender System from SIGIR 2021 (including the links for Paper PDF, Github Code and Dataset)☆24Jun 9, 2021Updated 5 years ago
- ☆12Mar 31, 2021Updated 5 years ago
- LLM serving cluster simulator☆153Apr 25, 2024Updated 2 years ago