powderluv / vllm-docsLinks

Documentation for vLLM Dev Channel releases

☆9

Alternatives and similar repositories for vllm-docs

Users that are interested in vllm-docs are comparing it to the libraries listed below

Sorting:

ROCm / vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆79Updated this week
ROCm / aiter
AI Tensor Engine for ROCm
☆201Updated this week
ROCm / triton
Development repository for the Triton language and compiler
☆123Updated this week
cray-lm / cray-lm
Cray-LM unified training and inference stack.
☆22Updated 4 months ago
ROCm / aotriton
Ahead of Time (AOT) Triton Math Library
☆64Updated last week
ROCm / TransformerEngine
☆37Updated this week
cchan / tccl
extensible collectives library in triton
☆87Updated 2 months ago
ROCm / MAD
☆19Updated this week
ROCm / flash-attention
Fast and memory-efficient exact attention
☆173Updated this week
ROCm / Megatron-LM
Ongoing research training transformer models at scale
☆22Updated this week
mk1-project / quickreduce
QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.
☆27Updated 2 months ago
NVIDIA / compute-eval
Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…
☆45Updated last month
pytorch-labs / applied-ai
Applied AI experiments and examples for PyTorch
☆274Updated last week
mobiusml / gemlite
Fast low-bit matmul kernels in Triton
☆311Updated this week
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆198Updated this week
ROCm / rccl-tests
RCCL Performance Benchmark Tests
☆67Updated 2 weeks ago
facebookresearch / FAMBench
Benchmarks to capture important workloads.
☆31Updated 4 months ago
neuralmagic / compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
☆121Updated this week
intel / torch-ccl
oneCCL Bindings for Pytorch*
☆97Updated last month
ROCm / rocmProfileData
☆24Updated last month
yifuwang / symm-mem-recipes
☆88Updated 5 months ago
simveit / effective_transpose
Effective transpose on Hopper GPU
☆20Updated last month
argonne-lcf / LLM-Inference-Bench
LLM-Inference-Bench
☆43Updated 5 months ago
foundation-model-stack / fastsafetensors
High-performance safetensors model loader
☆36Updated this week
wangsiping97 / FastGEMV
High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.
☆109Updated 10 months ago
ROCm / pyrsmi
python package of rocm-smi-lib
☆21Updated 8 months ago
AlibabaResearch / flash-llm
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
☆211Updated last year
triton-lang / kernels
☆80Updated 7 months ago
intel / torch-xpu-ops
☆46Updated this week
EmbeddedLLM / vllm
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
☆86Updated this week