vllm-project / vllm-spyreLinks
Community maintained hardware plugin for vLLM on Spyre
☆33Updated this week
Alternatives and similar repositories for vllm-spyre
Users that are interested in vllm-spyre are comparing it to the libraries listed below
Sorting:
- Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs☆12Updated 5 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆83Updated this week
- A tool to detect infrastructure issues on cloud native AI systems☆47Updated last week
- Cloud Native Benchmarking of Foundation Models☆42Updated last month
- A recommendation model kernel optimizing system☆10Updated 3 months ago
- A hierarchical collective communications library with portable optimizations☆36Updated 9 months ago
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆62Updated 2 months ago
- TACOS: [T]opology-[A]ware [Co]llective Algorithm [S]ynthesizer for Distributed Machine Learning☆26Updated 3 months ago
- ☆23Updated 2 months ago
- Magnum IO community repo☆99Updated last month
- CloudAI Benchmark Framework☆73Updated this week
- Systematic and comprehensive benchmarks for LLM systems.☆33Updated last month
- llm-d benchmark scripts and tooling☆28Updated this week
- ☆57Updated this week
- RCCL Performance Benchmark Tests☆76Updated last week
- Microsoft Collective Communication Library☆66Updated 10 months ago
- SYCL based CUTLASS implementation for Intel GPUs☆39Updated this week
- This repository contains the results and code for the MLPerf™ Training v3.0 benchmark.☆12Updated 2 years ago
- COCCL: Compression and precision co-aware collective communication library☆23Updated 6 months ago
- Reference implementations of MLPerf™ HPC training benchmarks☆49Updated 7 months ago
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.☆117Updated this week
- An I/O benchmark for deep Learning applications☆90Updated 2 weeks ago
- This is the public repo for the MLPerf DeepCAM climate data segmentation proposal.☆16Updated 3 years ago
- Offline optimization of your disaggregated Dynamo graph☆67Updated last week
- A Micro-benchmarking Tool for HPC Networks☆32Updated 3 weeks ago
- Paella: Low-latency Model Serving with Virtualized GPU Scheduling☆62Updated last year
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆33Updated 3 weeks ago
- ☆22Updated last month
- Provides the examples to write and build Habana custom kernels using the HabanaTools☆22Updated 5 months ago
- NVIDIA NCCL Tests for Distributed Training☆111Updated this week