SpRegTiling/sparse-register-tiling

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/SpRegTiling/sparse-register-tiling)

SpRegTiling / sparse-register-tiling

☆10

Alternatives and similar repositories for sparse-register-tiling

Users that are interested in sparse-register-tiling are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

AnonymousYWL / MYLIB
View on GitHub
☆18Apr 8, 2022Updated 4 years ago
HPC4AI / MeAtten
View on GitHub
The repository maintains the source code for the article titled "Optimizing Attention by Exploiting Data Reuse on ARM Multi-core CPUs."
☆17Dec 1, 2024Updated last year
nDIRECT / nDIRECT
View on GitHub
A direct convolution library targeting ARM multi-core CPUs.
☆12Nov 27, 2024Updated last year
eth-cscs / conflux
View on GitHub
Distributed Communication-Optimal LU-factorization Algorithm
☆12Aug 1, 2021Updated 4 years ago
Faraz9877 / H100_GEMM
View on GitHub
High-performance GEMM implementation optimized for NVIDIA H100 GPUs, leveraging Hopper architecture's TMA, WGMMA, and Thread Block Cluste…
☆11Dec 4, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
vnatesh / CAKE_on_CPU
View on GitHub
CAKE Library for constant-bandwidth matrix multiplication on CPUs
☆14Apr 6, 2024Updated 2 years ago
rutgers-apl / fpsanitizer
View on GitHub
A debugger to detect and diagnose numerical errors in floating point programs
☆12Jun 19, 2022Updated 4 years ago
sudo-panda / WABM
View on GitHub
☆11Nov 21, 2020Updated 5 years ago
Stardust-SJF / cuvs_rabitq
View on GitHub
cuVS - a library for vector search and clustering on the GPU. The IVF RaBitQ is under the cuvs_ivf_rabitq branch.
☆19Jun 18, 2026Updated last month
BenChung / Socp.jl
View on GitHub
A pure-Julia SOCP solver
☆10Dec 30, 2020Updated 5 years ago
pnnl / mcl
View on GitHub
☆14Dec 11, 2025Updated 7 months ago
LucasWilkinson / ASpT-mirror
View on GitHub
Mirror of http://gitlab.hpcrl.cse.ohio-state.edu/chong/ppopp19_ae, refactoring for understanding
☆17Oct 20, 2021Updated 4 years ago
lixiuhong / batched_gemm
View on GitHub
☆40Feb 28, 2020Updated 6 years ago
Sarath18 / raytracer_rust
View on GitHub
My implementation of the book Ray Tracing in One Weekend in Rust
☆15Sep 7, 2020Updated 5 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ParCIS / Magicube
View on GitHub
Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.
☆92Nov 23, 2022Updated 3 years ago
Ratbuyer / h100-features
View on GitHub
☆18Mar 12, 2025Updated last year
sudo-panda / rrt_planner_ros
View on GitHub
A simple implementation of RRT wrapped in ROS.
☆14Apr 14, 2020Updated 6 years ago
apuaaChen / vectorSparse
View on GitHub
☆32Aug 24, 2022Updated 3 years ago
ParCIS / FlashSparse
View on GitHub
FlashSparse significantly reduces the computation redundancy for unstructured sparsity (for SpMM and SDDMM) on Tensor Cores through a Swa…
☆39Oct 5, 2025Updated 9 months ago
bsc-pm / nanos6
View on GitHub
Nanos6 is a runtime that implements the OmpSs-2 parallel programming model, developed by the System Tools and Advanced Runtimes (STAR) gr…
☆22Jun 15, 2026Updated last month
HPCRL / ASPLOS_artifact
View on GitHub
☆13Nov 1, 2021Updated 4 years ago
pmodels / bolt
View on GitHub
Official BOLT Repository
☆33Aug 16, 2024Updated last year
sympiler / sympiler
View on GitHub
Sympiler is a Code Generator for Transforming Sparse Matrix Codes
☆44Jul 12, 2023Updated 3 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
Olympus-HPC / Mneme
View on GitHub
☆23Updated this week
shizmob / renpy2linux
View on GitHub
(migrated to https://codeberg.org/shiz/renpy2linux) Convert a Windows Ren'Py-based game into a Linux-compatible one.
☆31Nov 24, 2021Updated 4 years ago
ChenhanYu / rnn
View on GitHub
General Stride K-Nearest Neighbors
☆14Jun 15, 2021Updated 5 years ago
duterscmy / CD-MoE
View on GitHub
Official PyTorch implementation of CD-MOE
☆12Mar 18, 2026Updated 4 months ago
dose78 / CARMA
View on GitHub
Communication-Avoiding Recursive Matrix Multiply
☆17Jul 10, 2013Updated 13 years ago
pigirons / conv3x3_m1
View on GitHub
This is a demo how to write a high performance convolution run on apple silicon
☆56Feb 8, 2022Updated 4 years ago
icl-utk-edu / plasma
View on GitHub
PLASMA is a software package for solving problems in dense linear algebra using OpenMP
☆36May 11, 2026Updated 2 months ago
Deep-Learning-Profiling-Tools / fasten
View on GitHub
☆14Apr 24, 2024Updated 2 years ago
ZIB-IOL / SMS
View on GitHub
Code to reproduce the experiments of the ICLR24-paper: "Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging"
☆12Oct 14, 2025Updated 9 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
jaegertracing / jaeger-analytics-java
View on GitHub
Data analytics pipeline and models for tracing data
☆45Jul 11, 2024Updated 2 years ago
icl-utk-edu / hpcc
View on GitHub
HPC Challenge Benchmark
☆70Sep 28, 2025Updated 9 months ago
fujitsu / dnnl_aarch64
View on GitHub
☆54Sep 23, 2020Updated 5 years ago
IST-DASLab / sparseprop
View on GitHub
☆16Sep 27, 2023Updated 2 years ago
Paramathic / slim
View on GitHub
SLiM: One-shot Quantized Sparse Plus Low-rank Approximation of LLMs (ICML 2025)
☆36Nov 28, 2025Updated 7 months ago
dmort27 / fststr
View on GitHub
Simple library for manipulating strings using OpenFST
☆12Sep 26, 2021Updated 4 years ago
lecoan / pytorch-RLE
View on GitHub
A implement of run-length encoding for Pytorch tensor using CUDA
☆14Apr 7, 2021Updated 5 years ago