Distributed SDDMM Kernel
☆12Jul 8, 2022Updated 3 years ago
Alternatives and similar repositories for distributed_sddmm
Users that are interested in distributed_sddmm are comparing it to the libraries listed below
Sorting:
- [MLSys 2023] Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models☆16May 5, 2023Updated 2 years ago
- Example of applying CUDA graphs to LLaMA-v2☆12Aug 25, 2023Updated 2 years ago
- GPTPU for SC 2021☆52Mar 22, 2023Updated 2 years ago
- An IR for efficiently simulating distributed ML computation.☆32Jan 13, 2024Updated 2 years ago
- Reference implementation of the draft C++ GraphBLAS specification.☆32Feb 19, 2025Updated last year
- Benchmark for matrix multiplications between dense and block sparse (BSR) matrix in TVM, blocksparse (Gray et al.) and cuSparse.☆23Aug 21, 2020Updated 5 years ago
- A GPU algorithm for sparse matrix-matrix multiplication☆75Oct 1, 2020Updated 5 years ago
- ☆31Jun 15, 2022Updated 3 years ago
- The Combinatorial BLAS (CombBLAS) is an extensible distributed-memory parallel graph library offering a small but powerful set of linear …☆81Aug 8, 2025Updated 6 months ago
- ☆32Aug 24, 2022Updated 3 years ago
- Fast and full-featured Matrix Market I/O library for C++, Python, and R☆86Aug 5, 2024Updated last year
- Distributed Multi-GPU GNN Framework☆36Jun 26, 2020Updated 5 years ago
- A tool designed to compare energy and emission costs between computer chips☆13Dec 9, 2023Updated 2 years ago
- A Rougelike Peer-to-Peer Multi Player Dungeon Explorer Game written in Rust☆10Feb 12, 2022Updated 4 years ago
- ☆45Nov 10, 2023Updated 2 years ago
- ☆12Mar 31, 2021Updated 4 years ago
- ☆12Jul 24, 2024Updated last year
- The simulator for SPADA, an SpGEMM accelerator with adaptive dataflow☆47Jan 26, 2023Updated 3 years ago
- Unified Sparse Library Wrapper Based on cuSPARSE☆12May 24, 2022Updated 3 years ago
- Not just a PDE toolbox. Adapt your ideas from a clean, modular code base with Femeko.☆15Updated this week
- ☆13May 8, 2020Updated 5 years ago
- Distributed Communication-Optimal Shuffle and Transpose Algorithm☆14Feb 20, 2026Updated last week
- Sparse symmetric indefinite solver implemented with a runtime system☆13May 11, 2020Updated 5 years ago
- Exercises for the Dafny Tutorial☆14May 21, 2018Updated 7 years ago
- An efficient storage system for concurrent graph processing☆10Feb 1, 2021Updated 5 years ago
- Code for "Adaptive Self-improvement LLM Agentic System for ML Library Development" (ICML 2025)☆15Jan 6, 2026Updated last month
- LaTeX Examples Document Source☆11Apr 9, 2024Updated last year
- Distributed Communication-Optimal LU-factorization Algorithm☆12Aug 1, 2021Updated 4 years ago
- ☆10Nov 21, 2023Updated 2 years ago
- [ICML 2025] Efficiently Serving Large Multimodal Models Using EPD Disaggregation☆22May 29, 2025Updated 8 months ago
- [ICML 2025] MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design☆22Jul 4, 2025Updated 7 months ago
- Easy and efficient 2D spin glass simulation for quantum annealing☆12May 17, 2024Updated last year
- ☆13Jan 7, 2025Updated last year
- ☆10Aug 2, 2021Updated 4 years ago
- Flexible local Fourier analysis library.☆12Jun 22, 2021Updated 4 years ago
- SQL Optimizations using MLIR☆12Apr 5, 2020Updated 5 years ago
- See if we can't do some real-time learning for GMRES -- Rejoice!☆12Jun 19, 2022Updated 3 years ago
- A prototype of an SSA-based quantum IR exploiting value semantics☆12Jan 23, 2024Updated 2 years ago
- 基于FP16的二维脉动阵列电路设计☆13Feb 23, 2023Updated 3 years ago