☆10Mar 2, 2024Updated last year
Alternatives and similar repositories for sparse-register-tiling
Users that are interested in sparse-register-tiling are comparing it to the libraries listed below
Sorting:
- The repository maintains the source code for the article titled "Optimizing Attention by Exploiting Data Reuse on ARM Multi-core CPUs."☆15Dec 1, 2024Updated last year
- Distributed Communication-Optimal LU-factorization Algorithm☆12Aug 1, 2021Updated 4 years ago
- A direct convolution library targeting ARM multi-core CPUs.☆12Nov 27, 2024Updated last year
- ☆18Apr 8, 2022Updated 3 years ago
- CAKE Library for constant-bandwidth matrix multiplication on CPUs☆14Apr 6, 2024Updated last year
- Nanos6 is a runtime that implements the OmpSs-2 parallel programming model, developed by the System Tools and Advanced Runtimes (STAR) gr…☆22Jun 6, 2025Updated 8 months ago
- FlashSparse significantly reduces the computation redundancy for unstructured sparsity (for SpMM and SDDMM) on Tensor Cores through a Swa…☆39Oct 5, 2025Updated 4 months ago
- Official BOLT Repository☆32Aug 16, 2024Updated last year
- ☆32Aug 24, 2022Updated 3 years ago
- PLASMA is a software package for solving problems in dense linear algebra using OpenMP☆35Aug 13, 2025Updated 6 months ago
- ☆40Feb 28, 2020Updated 6 years ago
- ☆40Apr 3, 2022Updated 3 years ago
- Sparse symmetric indefinite solver implemented with a runtime system☆13May 11, 2020Updated 5 years ago
- Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.☆91Nov 23, 2022Updated 3 years ago
- Sympiler is a Code Generator for Transforming Sparse Matrix Codes☆44Jul 12, 2023Updated 2 years ago
- ☆23Dec 30, 2025Updated 2 months ago
- SQL Optimizations using MLIR☆12Apr 5, 2020Updated 5 years ago
- Official PyTorch implementation of CD-MOE☆12Mar 29, 2025Updated 11 months ago
- Flexible local Fourier analysis library.☆12Jun 22, 2021Updated 4 years ago
- An MPI wrapper for the pytorch tensor library that is automatically differentiable☆10Mar 27, 2023Updated 2 years ago
- LaTeX Examples Document Source☆11Apr 9, 2024Updated last year
- ☆12Dec 11, 2025Updated 2 months ago
- Distributed SDDMM Kernel☆12Jul 8, 2022Updated 3 years ago
- See if we can't do some real-time learning for GMRES -- Rejoice!☆12Jun 19, 2022Updated 3 years ago
- Code for "Adaptive Self-improvement LLM Agentic System for ML Library Development" (ICML 2025)☆15Jan 6, 2026Updated last month
- Website for Particle Physics Domain (UCSD Capstone)☆12Oct 23, 2021Updated 4 years ago
- rdiv!(::AbstractMatrix, ::UpperTriangular) and ldiv!(::LowerTriangular, ::AbstractMatrix)☆12Nov 18, 2024Updated last year
- EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation☆27Jul 30, 2025Updated 7 months ago
- Bagua tutorials.☆13Sep 4, 2022Updated 3 years ago
- Julia implementation of flash-attention operation for neural networks.☆11May 31, 2023Updated 2 years ago
- [ICML 2025] MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design☆22Jul 4, 2025Updated 7 months ago
- Parallel element agglomeration algebraic multigrid upscaling and solvers.☆16Jul 25, 2025Updated 7 months ago
- Simple library for manipulating strings using OpenFST☆12Sep 26, 2021Updated 4 years ago
- ☆54Sep 23, 2020Updated 5 years ago
- Example Fenics antenna simulations as part of "Basics of Antenna Modeling with FEniCS Finite Element Suite"☆13Nov 27, 2020Updated 5 years ago
- Source code of "FlowWalker: A Memory-efficient and High-performance GPU-based Dynamic Graph Random Walk Framework"☆11Oct 23, 2024Updated last year
- ☆11Nov 21, 2020Updated 5 years ago
- HiCOPS: Computational framework for peptide identification from MS data through accelerated database search☆10Mar 24, 2023Updated 2 years ago
- A debugger to detect and diagnose numerical errors in floating point programs☆12Jun 19, 2022Updated 3 years ago