powderluv / vllm-docs
Documentation for vLLM Dev Channel releases
☆9Updated 3 months ago
Alternatives and similar repositories for vllm-docs:
Users that are interested in vllm-docs are comparing it to the libraries listed below
- AI Tensor Engine for ROCm☆119Updated this week
- Development repository for the Triton language and compiler☆114Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆69Updated this week
- extensible collectives library in triton☆84Updated 6 months ago
- Cray-LM unified training and inference stack.☆21Updated last month
- Ahead of Time (AOT) Triton Math Library☆56Updated last week
- ☆37Updated this week
- Applied AI experiments and examples for PyTorch☆250Updated last week
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆109Updated last week
- ☆25Updated this week
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆87Updated this week
- ☆21Updated last month
- ☆112Updated this week
- ☆15Updated last week
- A safetensors extension to efficiently store sparse quantized tensors on disk☆91Updated this week
- Fast Matrix Multiplications for Lookup Table-Quantized LLMs☆235Updated last month
- A tool to configure, launch and manage your machine learning experiments.☆132Updated this week
- Fast low-bit matmul kernels in Triton☆272Updated this week
- ☆22Updated last week
- OpenAI Triton backend for Intel® GPUs☆170Updated this week
- ☆73Updated 4 months ago
- ☆54Updated 6 months ago
- oneCCL Bindings for Pytorch*☆91Updated 2 weeks ago
- Fast and memory-efficient exact attention☆162Updated this week
- Home for OctoML PyTorch Profiler☆108Updated last year
- ☆62Updated last month
- ☆57Updated 3 months ago
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆64Updated 3 years ago
- Explore training for quantized models☆17Updated 2 months ago
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"☆55Updated last month