QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.
☆38Aug 29, 2025Updated 6 months ago
Alternatives and similar repositories for quickreduce
Users that are interested in quickreduce are comparing it to the libraries listed below
Sorting:
- ☆49Mar 10, 2026Updated last week
- AI Tensor Engine for ROCm☆385Updated this week
- Large Language Model Text Generation Inference on Habana Gaudi☆34Mar 20, 2025Updated last year
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆26Feb 26, 2026Updated 3 weeks ago
- Modular RDMA Interface☆94Updated this week
- ☆10Nov 16, 2024Updated last year
- ☆12May 30, 2025Updated 9 months ago
- Benchmarks to capture important workloads.☆32Mar 6, 2026Updated 2 weeks ago
- some mixture of experts architecture implementations☆26Mar 22, 2024Updated 2 years ago
- DeepSeek-V3/R1 inference performance simulator☆189Mar 27, 2025Updated 11 months ago
- The goal of the OSSCI Fleet is to provide a central mechanism to enable test automation, batch job scheduling, and developer access to a …☆13Feb 27, 2026Updated 3 weeks ago
- NVIDIA Inference Xfer Library (NIXL)☆945Updated this week
- ☆64Updated this week
- A TensorFlow Extension: GPU performance tools for TensorFlow.☆26Jul 27, 2023Updated 2 years ago
- Kikofri, a Jekyll Theme, and a fork of Kiko.☆17Jul 9, 2019Updated 6 years ago
- A BUDE virtual-screening benchmark, in many programming models☆30Oct 15, 2024Updated last year
- Transformer Architecture written with CUDA, C++ and LibTorch.☆10Jul 26, 2025Updated 7 months ago
- ☆16Nov 10, 2025Updated 4 months ago
- An MLIR-based compiler that takes GPU kernels and compiles them to real hardware instructions. Interactive web visualizer included.☆119Updated this week
- ☆32Apr 19, 2025Updated 11 months ago
- ☆29Updated this week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆257Updated this week
- A safetensors extension to efficiently store sparse quantized tensors on disk☆263Mar 13, 2026Updated last week
- ☆47Nov 3, 2025Updated 4 months ago
- 100 days of CUDA Challenge☆49Aug 2, 2025Updated 7 months ago
- A Micro-benchmarking Tool for HPC Networks☆34Sep 2, 2025Updated 6 months ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆113Updated this week
- Standalone commandline CLI tool for compiling Triton kernels☆20Sep 13, 2024Updated last year
- ☆23Feb 17, 2026Updated last month
- NVIDIA NCCL Tests for Distributed Training☆138Mar 12, 2026Updated last week
- Swift package for reading and writing Safetensors files.☆12Feb 6, 2026Updated last month
- ☆163Dec 27, 2024Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆117Updated this week
- FP4 MAC Array☆19Apr 14, 2024Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆25Mar 5, 2026Updated 2 weeks ago
- extensible collectives library in triton☆97Mar 31, 2025Updated 11 months ago
- An interactive web-based tool for exploring intermediate representations of PyTorch and Triton models☆49Jan 23, 2026Updated last month
- a teaching deep learning framework: the bridge from micrograd to tinygrad☆61Updated this week
- UnrealCV for image rendering from 3D model☆14May 21, 2020Updated 5 years ago