HabanaAI / vllm-forkLinks
A high-throughput and memory-efficient inference and serving engine for LLMs
☆85Updated this week
Alternatives and similar repositories for vllm-fork
Users that are interested in vllm-fork are comparing it to the libraries listed below
Sorting:
- Large Language Model Text Generation Inference on Habana Gaudi☆34Updated 10 months ago
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆262Updated this week
- A low-latency & high-throughput serving engine for LLMs☆470Updated last month
- Dynamic Memory Management for Serving LLMs without PagedAttention☆458Updated 8 months ago
- AIPerf is a comprehensive benchmarking tool that measures the performance of generative AI models served by your preferred inference solu…☆126Updated this week
- NVIDIA NCCL Tests for Distributed Training☆136Updated 2 weeks ago
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆36Updated 5 months ago
- ArcticInference: vLLM plugin for high-throughput, low-latency inference