spcl / sten
Sparsity support for PyTorch
☆33Updated last week
Alternatives and similar repositories for sten:
Users that are interested in sten are comparing it to the libraries listed below
- Memory Optimizations for Deep Learning (ICML 2023)☆62Updated 10 months ago
- extensible collectives library in triton☆77Updated 4 months ago
- Fast Hadamard transform in CUDA, with a PyTorch interface☆135Updated 8 months ago
- ☆64Updated 2 months ago
- ☆38Updated last year
- TileFusion is a highly efficient kernel template library designed to elevate the level of abstraction in CUDA C for processing tiles.☆43Updated this week
- ☆97Updated 5 months ago
- PyTorch bindings for CUTLASS grouped GEMM.☆61Updated 2 months ago
- Experiment of using Tangent to autodiff triton☆74Updated last year
- Research and development for optimizing transformers☆125Updated 3 years ago
- ☆35Updated last month
- (NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.☆37Updated 2 years ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆43Updated 6 months ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆86Updated this week
- SparseTIR: Sparse Tensor Compiler for Deep Learning☆133Updated last year
- A parallel framework for training deep neural networks☆50Updated last week
- Fast and memory-efficient exact attention☆57Updated last month
- Distributed K-FAC Preconditioner for PyTorch☆85Updated this week
- [ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention☆26Updated last month
- ☆31Updated 6 months ago
- ☆16Updated 5 years ago
- A Data-Centric Compiler for Machine Learning☆82Updated last year
- ☆23Updated 2 months ago
- Triton-based implementation of Sparse Mixture of Experts.☆194Updated 2 months ago
- ☆12Updated 3 years ago
- ☆24Updated 2 weeks ago
- A Python library transfers PyTorch tensors between CPU and NVMe☆102Updated 2 months ago
- [IJCAI2023] An automated parallel training system that combines the advantages from both data and model parallelism. If you have any inte…☆51Updated last year
- The implementation for MLSys 2023 paper: "Cuttlefish: Low-rank Model Training without All The Tuning"☆43Updated last year
- A library for unit scaling in PyTorch☆122Updated 2 months ago