ParCIS/FlashSparse

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ParCIS/FlashSparse)

ParCIS / FlashSparse

FlashSparse significantly reduces the computation redundancy for unstructured sparsity (for SpMM and SDDMM) on Tensor Cores through a Swap-and-Transpose mapping strategy. FlashSparse is accepted by PPoPP 2025.

☆39

Alternatives and similar repositories for FlashSparse

Users that are interested in FlashSparse are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Hyaloid / AccSpMM
View on GitHub
Official implementation of Acc-SpMM: Accelerating General-purpose Sparse Matrix-Matrix Multiplication with GPU Tensor Cores.
☆17Nov 13, 2025Updated 8 months ago
CRAFT-THU / RoDe
View on GitHub
A Row Decomposition-based Approach for Sparse Matrix Multiplication on GPUs
☆30Nov 29, 2023Updated 2 years ago
ParCIS / Magicube
View on GitHub
Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.
☆92Nov 23, 2022Updated 3 years ago
HPMLL / DTC-SpMM_ASPLOS24
View on GitHub
☆47Jun 19, 2024Updated 2 years ago
SpRegTiling / sparse-register-tiling
View on GitHub
☆10Mar 2, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Deep-Learning-Profiling-Tools / fasten
View on GitHub
☆14Apr 24, 2024Updated 2 years ago
YukeWang96 / TC-GNN_ATC23
View on GitHub
Artifact for USENIX ATC'23: TC-GNN: Bridging Sparse GNN Computation and Dense Tensor Cores on GPUs.
☆58Oct 16, 2023Updated 2 years ago
HPMLL / SpInfer_EuroSys25
View on GitHub
☆35Apr 2, 2025Updated last year
thunlp / BlockFFN
View on GitHub
Source codes for paper "BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity".
☆19Jan 10, 2026Updated 6 months ago
alan-hpc / cuda_op_benchmark
View on GitHub
方便扩展的Cuda算子理解和优化框架，仅用在学习使用
☆18Jun 13, 2024Updated 2 years ago
monellz / FlashTensor
View on GitHub
☆19Mar 4, 2025Updated last year
SuperScientificSoftwareLaboratory / DASP
View on GitHub
Source code of the SC '23 paper: "DASP: Specific Dense Matrix Multiply-Accumulate Units Accelerated General Sparse Matrix-Vector Multipli…
☆29Jun 18, 2024Updated 2 years ago
hatsu3 / Sanger
View on GitHub
☆48Aug 23, 2021Updated 4 years ago
araij / rabbit_order
View on GitHub
☆49Jan 30, 2026Updated 5 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Bruce-Lee-LY / cuda_back2back_hgemm
View on GitHub
Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.
☆13Nov 3, 2023Updated 2 years ago
vtsynergy / bb_segsort
View on GitHub
☆21Aug 21, 2023Updated 2 years ago
apuaaChen / vectorSparse
View on GitHub
☆32Aug 24, 2022Updated 3 years ago
spcl / DNN-cpp-proxies
View on GitHub
C++/MPI proxies for distributed training of deep neural networks.
☆16Jun 18, 2022Updated 4 years ago
han-shi / SparseBERT
View on GitHub
☆13Nov 25, 2022Updated 3 years ago
google / rago
View on GitHub
☆31Jun 22, 2025Updated last year
microsoft / ConvStencil
View on GitHub
☆37Apr 10, 2024Updated 2 years ago
jshun / ppopp20-ae
View on GitHub
☆16Feb 26, 2020Updated 6 years ago
SuperScientificSoftwareLaboratory / TileSpMV
View on GitHub
Source code of the IPDPS '21 paper: "TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs" by Yuyao Niu, Zhengyang…
☆13Aug 12, 2022Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
uwsampl / SparseTIR
View on GitHub
SparseTIR: Sparse Tensor Compiler for Deep Learning
☆145Mar 31, 2023Updated 3 years ago
cornell-zhang / UniSparse
View on GitHub
UniSparse: An Intermediate Language for General Sparse Format Customization (OOPSLA'24)
☆34Nov 12, 2024Updated last year
GhadaSokar / WAST
View on GitHub
[NeurIPS2022] Where to Pay Attention in Sparse Training for Feature Selection?
☆12Feb 10, 2023Updated 3 years ago
ParCIS / Chimera
View on GitHub
Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.
☆72Mar 20, 2025Updated last year
spcl / smat
View on GitHub
Code for High Performance Unstructured SpMM Computation Using Tensor Cores
☆35Nov 3, 2024Updated last year
AnonymousRepo123 / AlphaSparse
View on GitHub
A intelligent matrix format designer for SpMV
☆10Oct 10, 2023Updated 2 years ago
khaki3 / ptxas-wrapper
View on GitHub
A Symbolic Emulator for Shuffle Synthesis on the NVIDIA PTX Code
☆16Mar 19, 2023Updated 3 years ago
zzh-thu-22 / ExtendAttack
View on GitHub
[AAAI 2026] This is the official implementation of the paper "ExtendAttack: Attacking Servers of LRMs via Extending Reasoning".
☆25Mar 18, 2026Updated 4 months ago
horizon-research / imagen
View on GitHub
☆10Mar 8, 2025Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
fpgasystems / fpga-hyperloglog
View on GitHub
FPGA-based HyperLogLog Accelerator
☆12Jul 13, 2020Updated 6 years ago
LucasWilkinson / ASpT-mirror
View on GitHub
Mirror of http://gitlab.hpcrl.cse.ohio-state.edu/chong/ppopp19_ae, refactoring for understanding
☆17Oct 20, 2021Updated 4 years ago
CMU-SAFARI / PyGim
View on GitHub
PyGim is the first runtime framework to efficiently execute Graph Neural Networks (GNNs) on real Processing-in-Memory systems. It provide…
☆36Apr 23, 2025Updated last year
Ivanrs297 / cuda-spmv-csr
View on GitHub
Parallel SpMV using CSR representation, built in CUDA
☆14Jun 27, 2020Updated 6 years ago
AIS-SNU / GraNNDis_Artifact
View on GitHub
[PACT'24] GraNNDis. A fast and unified distributed graph neural network (GNN) training framework for both full-batch (full-graph) and min…
☆10Aug 13, 2024Updated last year
NickdeDycker / EstimoteIndoorAndroid
View on GitHub
Estimote Indoor Location finder
☆15Jan 29, 2015Updated 11 years ago
Siddharth13s / RISC-V_Synthesis_and_Physical_Design
View on GitHub
Synthesis using Synopsys DC and Physical Design flow using Synopsys ICC II, of my RISC-V 5 stage pipelined using 32 nm tech repo
☆15Jul 31, 2024Updated last year