vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
☆95Jun 25, 2026Updated last week
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Slides from 2021-12-15 talk, "TVM Developer Bootcamp – Writing Hardware Backends"☆11Jan 20, 2022Updated 4 years ago
- ☆23Mar 16, 2026Updated 3 months ago
- Torch Frontend for IREE☆26Dec 21, 2023Updated 2 years ago
- Fast and memory-efficient exact attention☆232Jun 22, 2026Updated last week
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (N…☆12Jun 24, 2024Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A high-throughput and memory-efficient inference and serving engine for LLMs☆122Jun 25, 2026Updated last week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo. NOTE: develop branch is maintained as a read-only mirror☆538Updated this week
- This is the public repo for the MLPerf DeepCAM climate data segmentation proposal.☆16Sep 30, 2025Updated 9 months ago
- The AMD rocAL is designed to efficiently decode and process images and videos from a variety of storage formats and modify them through a…☆24Jun 22, 2026Updated last week
- hipDF - GPU DataFrame Library☆18Mar 16, 2026Updated 3 months ago
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆11May 6, 2023Updated 3 years ago
- HIP backend patch for Numba, the NumPy aware dynamic Python compiler using LLVM.☆21May 11, 2026Updated last month
- Tools for formatting large language model prompts.☆13Dec 19, 2023Updated 2 years ago
- PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo☆17Mar 13, 2023Updated 3 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆140Jun 18, 2026Updated 2 weeks ago
- A system validation and diagnostics tool for monitoring, stress testing, detecting, and troubleshooting issues impacting AMD GPUs in high…☆105Updated this week
- ☆15Apr 21, 2025Updated last year
- ☆16Nov 19, 2025Updated 7 months ago
- ☆71Updated this week
- Fast inference engine for Transformer models☆57Nov 9, 2024Updated last year
- High-Performance Linpack Benchmark adopted version for GPU backend☆12Sep 12, 2022Updated 3 years ago
- AMD's graph optimization engine.☆311Jun 26, 2026Updated last week
- Dorylus: Affordable, Scalable, and Accurate GNN Training☆76May 31, 2021Updated 5 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- A pseudo random number generator library written against the SYCL API.☆11Jun 11, 2019Updated 7 years ago
- CMake modules used within the ROCm libraries☆75Jun 25, 2026Updated last week
- SMT-LIB benchmarks for shape computations from deep learning models in PyTorch☆18Dec 21, 2022Updated 3 years ago
- AMD lab notes with code examples to demonstrate use of AMD GPUs☆115Jun 28, 2024Updated 2 years ago
- OpenMP offload playground☆10Nov 16, 2024Updated last year
- ☆12Updated this week
- This is the open source version of HPL-MXP. The code performance has been verified on Frontier☆18Jul 9, 2025Updated 11 months ago
- ☆346Jun 9, 2026Updated 3 weeks ago
- ☆28May 2, 2023Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- AMD’s C++ library for accelerating tensor primitives☆49Jun 25, 2026Updated last week
- ☆30Aug 31, 2022Updated 3 years ago
- Guides to hopefully simplify the process of using ROCm.☆12Sep 26, 2024Updated last year
- Open source audio recorder and transcriber for MacOS☆83Feb 27, 2026Updated 4 months ago
- setup the env for vllm users☆16Oct 31, 2023Updated 2 years ago
- CUDA Flux is a profiler for GPU applications which reports the basic block executions frequencies of compute kernels☆33Mar 15, 2021Updated 5 years ago
- Quick and easy Diffusers CLI☆15Jun 15, 2026Updated 2 weeks ago