QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.
☆38Aug 29, 2025Updated 9 months ago
Alternatives and similar repositories for quickreduce
Users that are interested in quickreduce are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A "standard library" of Triton kernels.☆26Oct 2, 2025Updated 8 months ago
- Large Language Model Text Generation Inference on Habana Gaudi☆34Mar 20, 2025Updated last year
- ☆57Updated this week
- AI Tensor Engine for ROCm☆458Updated this week
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆28May 28, 2026Updated last week
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A Python client library for the Globus Online Transfer API☆21Apr 20, 2016Updated 10 years ago
- Thallium is a C++14 library wrapping Margo, Mercury, and Argobots and providing an object-oriented way to use these libraries.☆16May 4, 2026Updated last month
- ☆19Apr 16, 2025Updated last year
- Modular RDMA Interface☆130Jun 4, 2026Updated last week
- Boosting 4-bit inference kernels with 2:4 Sparsity☆96Sep 4, 2024Updated last year
- updates have been moved to https://github.com/lzhengchun/TomoGAN☆12Mar 15, 2021Updated 5 years ago
- Reconstructing tomography data. Faster!☆15Mar 23, 2026Updated 2 months ago
- Benchmarks to capture important workloads.☆33Apr 1, 2026Updated 2 months ago
- DeepSeek-V3/R1 inference performance simulator☆196Mar 27, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Pty-Chi is a Python library for ptychographic image reconstruction.☆27May 22, 2026Updated 2 weeks ago
- The goal of the OSSCI Fleet is to provide a central mechanism to enable test automation, batch job scheduling, and developer access to a …☆13Apr 28, 2026Updated last month
- free5GC 5GC & UERANSIM UE / RAN Sample Configuration - Select nearby UPF according to the connected gNodeB☆11Mar 31, 2024Updated 2 years ago
- ☆19Jan 9, 2018Updated 8 years ago
- NVIDIA Inference Xfer Library (NIXL)☆1,072Updated this week
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (N…☆12Jun 24, 2024Updated last year
- A high-performance acceleration library dedicated to large-scale model training on AMD GPUs☆64Jun 4, 2026Updated last week
- A BUDE virtual-screening benchmark, in many programming models☆31Oct 15, 2024Updated last year
- Browser-based image viewer with a support of arbitrary large images☆26Feb 26, 2024Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Project showing how to develop NKI kernels for Llama 3.2 1B inference☆21May 29, 2025Updated last year
- ☆33Apr 19, 2025Updated last year
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆259Updated this week
- A safetensors extension to efficiently store sparse quantized tensors on disk☆289Updated this week
- ☆32Updated this week
- ☆30May 21, 2026Updated 3 weeks ago
- 100 days of CUDA Challenge☆49Aug 2, 2025Updated 10 months ago
- A Micro-benchmarking Tool for HPC Networks☆34Sep 2, 2025Updated 9 months ago
- ☆29Aug 29, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆113Updated this week
- ☆23Mar 16, 2026Updated 2 months ago
- Achieving Consistent Low Latency for Wireless Real-Time Communications with the Shortest Control Loop (SIGCOMM 2022)☆20Oct 17, 2023Updated 2 years ago
- NVIDIA NCCL Tests for Distributed Training☆146Jun 2, 2026Updated last week
- ☆168Dec 27, 2024Updated last year
- FP4 MAC Array☆19Apr 14, 2024Updated 2 years ago
- Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k c…☆27Dec 10, 2022Updated 3 years ago