IST-DASLab / sparsepropLinks
☆15Updated 2 years ago
Alternatives and similar repositories for sparseprop
Users that are interested in sparseprop are comparing it to the libraries listed below
Sorting:
- ☆16Updated this week
- This repository contains the experimental PyTorch native float8 training UX☆224Updated last year
- A safetensors extension to efficiently store sparse quantized tensors on disk☆164Updated this week
- ☆153Updated 3 months ago
- extensible collectives library in triton☆88Updated 6 months ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆82Updated last year
- Fast low-bit matmul kernels in Triton☆373Updated last week
- Triton-based implementation of Sparse Mixture of Experts.☆241Updated last month
- Collection of kernels written in Triton language☆155Updated 5 months ago
- The evaluation framework for training-free sparse attention in LLMs☆98Updated 3 months ago
- Experiment of using Tangent to autodiff triton☆81Updated last year
- Fast Hadamard transform in CUDA, with a PyTorch interface☆239Updated 3 weeks ago
- Applied AI experiments and examples for PyTorch☆296Updated last month
- Fast and memory-efficient exact attention☆70Updated 7 months ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆213Updated this week
- ☆331Updated 3 weeks ago
- Official implementation for Training LLMs with MXFP4☆91Updated 5 months ago
- ☆98Updated last month
- Code for studying the super weight in LLM☆119Updated 10 months ago
- A block oriented training approach for inference time optimization.☆34Updated last year
- Cataloging released Triton kernels.☆261Updated 3 weeks ago
- ring-attention experiments☆152Updated 11 months ago
- A library for unit scaling in PyTorch☆130Updated 2 months ago
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆246Updated 8 months ago
- QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning☆99Updated 3 weeks ago
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…☆146Updated last year
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆111Updated 11 months ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆240Updated 3 months ago
- This repository contains code for the MicroAdam paper.☆20Updated 9 months ago
- FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores☆330Updated 9 months ago