ptillet / torch-blocksparseLinks

Block-sparse primitives for PyTorch

☆157

Alternatives and similar repositories for torch-blocksparse

Users that are interested in torch-blocksparse are comparing it to the libraries listed below

Sorting:

HazyResearch / butterfly
Butterfly matrix multiplication in PyTorch
☆174Updated last year
spcl / substation
Research and development for optimizing transformers
☆129Updated 4 years ago
HazyResearch / fly
☆210Updated 2 years ago
YulhwaKim / cutlass_tilesparse
CUDA templates for tile-sparse matrix multiplication based on CUTLASS.
☆51Updated 7 years ago
parasj / checkmate
Training neural networks in TensorFlow 2.0 with 5x less memory
☆132Updated 3 years ago
google-research / sputnik
A library of GPU kernels for sparse matrix operations.
☆270Updated 4 years ago
google-research / rigl
End-to-end training of sparse deep neural networks with little-to-no performance loss.
☆324Updated 2 years ago
albanD / subclass_zoo
☆171Updated last year
Tiiiger / QPyTorch
Low Precision Arithmetic Simulation in PyTorch
☆282Updated last year
lucidrains / triton-transformer
Implementation of a Transformer, but completely in Triton
☆273Updated 3 years ago
huggingface / pytorch_block_sparse
Fast Block Sparse Matrices for Pytorch
☆548Updated 4 years ago
kaiyuyue / torchshard
Slicing a PyTorch Tensor Into Parallel Shards
☆299Updated last month
TezRomacH / layer-to-layer-pytorch
PyTorch implementation of L2L execution algorithm
☆107Updated 2 years ago
NVIDIA / PyProf
A GPU performance profiling tool for PyTorch models
☆503Updated 4 years ago
jundaf2 / INT8-Flash-Attention-FMHA-Quantization
☆158Updated last year
HazyResearch / structured-nets
Structured matrices for compressing neural networks
☆67Updated last year
DeMoriarty / custom_matmul_kernels
Customized matrix multiplication kernels
☆56Updated 3 years ago
harvardnlp / genbmm
CUDA kernels for generalized matrix-multiplication in PyTorch
☆85Updated 3 years ago
mit-han-lab / hardware-aware-transformers
[ACL'20] HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
☆336Updated last year
stevenygd / SWALP
Code for paper "SWALP: Stochastic Weight Averaging forLow-Precision Training".
☆62Updated 6 years ago
pytorch / nestedtensor
[Prototype] Tools for the concurrent manipulation of variably sized Tensors.
☆251Updated 2 years ago
stanford-futuredata / stk
☆107Updated 11 months ago
gpauloski / kfac-pytorch
Distributed K-FAC preconditioner for PyTorch
☆88Updated last week
GATECH-EIC / Early-Bird-Tickets
[ICLR 2020] Drawing Early-Bird Tickets: Toward More Efficient Training of Deep Networks
☆138Updated 4 years ago
pytorch / torchdistx
Torch Distributed Experimental
☆117Updated last year
jfainberg / hashed_nets
PyTorch implementation of HashedNets
☆36Updated 2 years ago
utsaslab / MONeT
MONeT framework for reducing memory consumption of DNN training
☆173Updated 4 years ago
BradMcDanel / sdgp
☆10Updated 3 years ago
uwsampl / dtr-prototype
Dynamic Tensor Rematerialization prototype (modified PyTorch) and simulator. Paper: https://arxiv.org/abs/2006.09616
☆132Updated 2 years ago
epfml / powersgd
Practical low-rank gradient compression for distributed optimization: https://arxiv.org/abs/1905.13727
☆147Updated 9 months ago