IST-DASLab / sparseprop
☆13Updated last year
Alternatives and similar repositories for sparseprop:
Users that are interested in sparseprop are comparing it to the libraries listed below
- ☆113Updated last week
- Boosting 4-bit inference kernels with 2:4 Sparsity☆72Updated 7 months ago
- Work in progress.☆50Updated 2 weeks ago
- Explore training for quantized models☆17Updated 2 months ago
- Fast Matrix Multiplications for Lookup Table-Quantized LLMs☆236Updated last month
- Fast Hadamard transform in CUDA, with a PyTorch interface☆165Updated 10 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆125Updated 4 months ago
- Code for studying the super weight in LLM☆94Updated 4 months ago
- extensible collectives library in triton☆84Updated this week
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆105Updated 5 months ago
- Experiment of using Tangent to autodiff triton☆78Updated last year
- ☆125Updated last year
- A library for unit scaling in PyTorch☆125Updated 4 months ago
- Unit Scaling demo and experimentation code☆16Updated last year
- Official repository of Sparse ISO-FLOP Transformations for Maximizing Training Efficiency☆25Updated 8 months ago
- QuIP quantization☆52Updated last year
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…☆124Updated 7 months ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆59Updated 2 months ago
- ☆59Updated 4 months ago
- Triton Implementation of HyperAttention Algorithm☆47Updated last year
- Fast and memory-efficient exact attention☆67Updated last month
- FlexAttention w/ FlashAttention3 Support☆26Updated 5 months ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆35Updated 11 months ago
- Dynamic Neural Architecture Search Toolkit☆29Updated 4 months ago
- ☆46Updated 8 months ago
- ☆122Updated last month
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆40Updated last year
- A safetensors extension to efficiently store sparse quantized tensors on disk☆92Updated this week
- ☆49Updated 2 weeks ago
- Collection of kernels written in Triton language☆117Updated last month