powderluv / vllm-docs
Documentation for vLLM Dev Channel releases
☆9Updated 4 months ago
Alternatives and similar repositories for vllm-docs:
Users that are interested in vllm-docs are comparing it to the libraries listed below
- AI Tensor Engine for ROCm☆180Updated this week
- Cray-LM unified training and inference stack.☆22Updated 2 months ago
- Development repository for the Triton language and compiler☆118Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆75Updated this week
- ☆18Updated this week
- A safetensors extension to efficiently store sparse quantized tensors on disk☆102Updated this week
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆37Updated 9 months ago
- Applied AI experiments and examples for PyTorch☆262Updated last month
- ☆29Updated this week
- Fast and memory-efficient exact attention☆171Updated this week
- extensible collectives library in triton☆85Updated 3 weeks ago
- Explore training for quantized models☆18Updated 3 months ago
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆62Updated last month
- RCCL Performance Benchmark Tests☆64Updated this week
- LLM-Inference-Bench☆40Updated 3 months ago
- oneCCL Bindings for Pytorch*☆95Updated this week
- ☆78Updated 5 months ago
- ☆22Updated 2 months ago
- Ahead of Time (AOT) Triton Math Library☆58Updated this week
- ☆198Updated 9 months ago
- Fast low-bit matmul kernels in Triton☆294Updated this week
- OpenAI Triton backend for Intel® GPUs☆182Updated this week
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆106Updated 9 months ago
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditiona…☆91Updated this week
- Ongoing research training transformer models at scale☆18Updated this week
- ☆39Updated this week
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆193Updated this week
- ☆53Updated 7 months ago
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆24Updated last month
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆150Updated last week