powderluv / vllm-docsLinks
Documentation for vLLM Dev Channel releases
☆9Updated 6 months ago
Alternatives and similar repositories for vllm-docs
Users that are interested in vllm-docs are comparing it to the libraries listed below
Sorting:
- A high-throughput and memory-efficient inference and serving engine for LLMs☆79Updated this week
- AI Tensor Engine for ROCm☆201Updated this week
- Development repository for the Triton language and compiler☆123Updated this week
- Cray-LM unified training and inference stack.☆22Updated 4 months ago
- Ahead of Time (AOT) Triton Math Library☆64Updated last week
- ☆37Updated this week
- extensible collectives library in triton☆87Updated 2 months ago
- ☆19Updated this week
- Fast and memory-efficient exact attention☆173Updated this week
- Ongoing research training transformer models at scale☆22Updated this week
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆27Updated 2 months ago
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…☆45Updated last month
- Applied AI experiments and examples for PyTorch☆274Updated last week
- Fast low-bit matmul kernels in Triton☆311Updated this week
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆198Updated this week
- RCCL Performance Benchmark Tests☆67Updated 2 weeks ago
- Benchmarks to capture important workloads.☆31Updated 4 months ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆121Updated this week
- oneCCL Bindings for Pytorch*☆97Updated last month
- ☆24Updated last month
- ☆88Updated 5 months ago
- Effective transpose on Hopper GPU☆20Updated last month
- LLM-Inference-Bench☆43Updated 5 months ago
- High-performance safetensors model loader☆36Updated this week
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆109Updated 10 months ago
- python package of rocm-smi-lib☆21Updated 8 months ago
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity☆211Updated last year
- ☆80Updated 7 months ago
- ☆46Updated this week
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆86Updated this week