IST-DASLab / sparsepropLinks
☆15Updated last year
Alternatives and similar repositories for sparseprop
Users that are interested in sparseprop are comparing it to the libraries listed below
Sorting:
- This repository contains code for the MicroAdam paper.☆20Updated 7 months ago
- Code for studying the super weight in LLM☆113Updated 7 months ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆80Updated 10 months ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆137Updated this week
- ☆74Updated 3 weeks ago
- The evaluation framework for training-free sparse attention in LLMs☆83Updated last month
- Work in progress.☆71Updated 3 weeks ago
- ☆140Updated 3 weeks ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆128Updated 7 months ago
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆109Updated 9 months ago
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…☆138Updated 11 months ago
- Fast Hadamard transform in CUDA, with a PyTorch interface☆206Updated last year
- QuIP quantization☆54Updated last year
- Fast and memory-efficient exact attention☆68Updated 4 months ago
- ☆49Updated last year
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆142Updated last month
- QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning☆17Updated this week
- ☆15Updated 3 weeks ago
- extensible collectives library in triton☆87Updated 3 months ago
- Experiment of using Tangent to autodiff triton☆79Updated last year
- A bunch of kernels that might make stuff slower 😉☆55Updated last week
- PB-LLM: Partially Binarized Large Language Models☆152Updated last year
- Collection of kernels written in Triton language☆136Updated 3 months ago
- ☆13Updated last year
- RWKV-7: Surpassing GPT☆92Updated 8 months ago
- ☆119Updated last month
- Unit Scaling demo and experimentation code☆16Updated last year
- Code for data-aware compression of DeepSeek models☆38Updated last month
- This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"☆87Updated last month
- A library for unit scaling in PyTorch☆125Updated last week