ROCm / vllmLinks

A high-throughput and memory-efficient inference and serving engine for LLMs

☆110

Alternatives and similar repositories for vllm

Users that are interested in vllm are comparing it to the libraries listed below

Sorting:

ROCm / triton
Development repository for the Triton language and compiler
☆137Updated this week
ROCm / TransformerEngine
☆51Updated this week
ROCm / flash-attention
Fast and memory-efficient exact attention
☆201Updated last month
ROCm / aiter
AI Tensor Engine for ROCm
☆306Updated this week
intel / intel-xpu-backend-for-triton
OpenAI Triton backend for Intel® GPUs
☆221Updated this week
vllm-project / flash-attention
Fast and memory-efficient exact attention
☆102Updated last week
ROCm / rccl-tests
RCCL Performance Benchmark Tests
☆79Updated last week
ROCm / composable_kernel
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
☆490Updated this week
mk1-project / quickreduce
QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.
☆35Updated 3 months ago
bytedance / ByteMLPerf
AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…
☆268Updated 3 months ago
ROCm / aotriton
Ahead of Time (AOT) Triton Math Library
☆84Updated 2 weeks ago
usyd-fsalab / fp6_llm
An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).
☆272Updated 4 months ago
mlc-ai / relax
☆170Updated 2 weeks ago
flagos-ai / FlagCX
☆128Updated last week
intel / intel-extension-for-deepspeed
Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…
☆63Updated 5 months ago
ROCm / rocmProfileData
☆27Updated 2 months ago
NVIDIA / nvbandwidth
A tool for bandwidth measurements on NVIDIA GPUs.
☆571Updated 7 months ago
ROCm / AMDMIGraphX
AMD's graph optimization engine.
☆266Updated this week
ROCm / hipBLASLt
[DEPRECATED] Moved to ROCm/rocm-libraries repo
☆114Updated this week
ROCm / iris
AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming
☆116Updated last week
ROCm / rocWMMA
[DEPRECATED] Moved to ROCm/rocm-libraries repo
☆137Updated last week
triton-lang / kernels
☆94Updated last year
triton-lang / triton-cpu
An experimental CPU backend for Triton
☆164Updated 2 weeks ago
ROCm / rccl
ROCm Communication Collectives Library (RCCL)
☆400Updated this week
amd / ZenDNN
☆127Updated last week
HandH1998 / QQQ
QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.
☆146Updated 3 months ago
intel / xFasterTransformer
☆431Updated 2 months ago
ROCm / Megatron-LM
Ongoing research training transformer models at scale
☆33Updated this week
ROCm / apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
☆23Updated last week
ROCm / rocm_bandwidth_test
Bandwidth test for ROCm
☆69Updated last week