spcl / stenLinks

Sparsity support for PyTorch

☆35

Alternatives and similar repositories for sten

Users that are interested in sten are comparing it to the libraries listed below

Sorting:

exists-forall / striped_attention
☆39Updated last year
Jokeren / triton-samples
☆28Updated 6 months ago
cchan / tccl
extensible collectives library in triton
☆87Updated 3 months ago
stanford-futuredata / stk
☆106Updated 10 months ago
axonn-ai / axonn
A parallel framework for training deep neural networks
☆62Updated 4 months ago
open-lm-engine / cute-kernels
A bunch of kernels that might make stuff slower 😉
☆55Updated this week
triton-lang / kernels
☆83Updated 8 months ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆79Updated last year
awslabs / ratex
☆23Updated 7 months ago
pytorch-labs / helion
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
☆187Updated this week
topal-team / rockmate
☆36Updated 7 months ago
google-research / sputnik
A library of GPU kernels for sparse matrix operations.
☆270Updated 4 years ago
manishucsd / py-codegen
☆16Updated 9 months ago
microsoft / TileFusion
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
☆90Updated 2 weeks ago
gpauloski / kfac-pytorch
Distributed K-FAC preconditioner for PyTorch
☆87Updated last week
spcl / daceml
A Data-Centric Compiler for Machine Learning
☆84Updated last year
facebookresearch / MODel_opt
Memory Optimizations for Deep Learning (ICML 2023)
☆64Updated last year
olcf / NVIDIA-tensor-core-examples
☆18Updated 5 years ago
uwsampl / SparseTIR
SparseTIR: Sparse Tensor Compiler for Deep Learning
☆138Updated 2 years ago
parasj / checkmate
Training neural networks in TensorFlow 2.0 with 5x less memory
☆132Updated 3 years ago
spcl / substation
Research and development for optimizing transformers
☆129Updated 4 years ago
HazyResearch / butterfly
Butterfly matrix multiplication in PyTorch
☆172Updated last year
hpcaitech / TensorNVMe
A Python library transfers PyTorch tensors between CPU and NVMe
☆116Updated 7 months ago
Dao-AILab / fast-hadamard-transform
Fast Hadamard transform in CUDA, with a PyTorch interface
☆206Updated last year
zinccat / Awesome-Triton-Kernels
Collection of kernels written in Triton language
☆136Updated 3 months ago
IBM / triton-dejavu
Framework to reduce autotune overhead to zero for well known deployments.
☆79Updated last week
Aleph-Alpha-Research / NeurIPS-WANT-submission-efficient-parallelization-layouts
☆22Updated last year
tgale96 / grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
☆101Updated last month
north-numerical-computing / tensor-cores-numerical-behavior
Test suite for probing the numerical behavior of NVIDIA tensor cores
☆40Updated 11 months ago
DerrickYLJ / TidalDecode
[ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
☆40Updated 2 months ago