IST-DASLab / SparseFinetuningLinks

Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry

☆42

Alternatives and similar repositories for SparseFinetuning

Users that are interested in SparseFinetuning are comparing it to the libraries listed below

Sorting:

siyan-zhao / prepacking
The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …
☆60Updated last year
IST-DASLab / QIGen
Repository for CPU Kernel Generation for LLM Inference
☆27Updated 2 years ago
IST-DASLab / Sparse-Marlin
Boosting 4-bit inference kernels with 2:4 Sparsity
☆86Updated last year
IST-DASLab / RoSA
Official implementation of the ICML 2024 paper RoSA (Robust Adaptation)
☆44Updated last year
chu-tianxiang / QuIP-for-all
QuIP quantization
☆61Updated last year
rayleizhu / vllm-ra
[ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts
☆40Updated last year
IST-DASLab / MicroAdam
This repository contains code for the MicroAdam paper.
☆21Updated 11 months ago
PiotrNawrot / sparse-frontier
The evaluation framework for training-free sparse attention in LLMs
☆106Updated last month
IST-DASLab / QuEST
Work in progress.
☆75Updated last week
IST-DASLab / Quartet
☆110Updated 2 weeks ago
wdlctc / mini-s
☆53Updated last year
HanGuo97 / lq-lora
☆128Updated last year
hahnyuan / PB-LLM
PB-LLM: Partially Binarized Large Language Models
☆157Updated 2 years ago
linxihui / dkernel
☆20Updated 7 months ago
softmax1 / Flash-Attention-Softmax-N
CUDA and Triton implementations of Flash Attention with SoftmaxN.
☆73Updated last year
Infini-AI-Lab / gsm_infinite
☆55Updated 5 months ago
frankxwang / dpo-prefix-sharing
DPO, but faster 🚀
☆46Updated 11 months ago
AnswerDotAI / cold-compress
Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…
☆146Updated last year
kyegomez / Infini-attention
Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…
☆57Updated last week
Infini-AI-Lab / MagicDec
[ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
☆132Updated last year
FasterDecoding / TEAL
☆154Updated 9 months ago
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆130Updated last year
TRI-ML / linear_open_lm
A repository for research on medium sized language models.
☆78Updated last year
OpenNLPLab / LASP
Linear Attention Sequence Parallelism (LASP)
☆87Updated last year
shreyansh26 / Attention-Mask-Patterns
Using FlexAttention to compute attention with different masking patterns
☆47Updated last year
PiotrNawrot / nano-sparse-attention
The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.
☆91Updated 4 months ago
VITA-Group / WeLore
[ICML 2025] From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories and Applications
☆51Updated last month
mayank31398 / ladder-residual-inference
☆14Updated 4 months ago
jeffreysijuntan / lloco
The official repo for "LLoCo: Learning Long Contexts Offline"
☆118Updated last year
selfsupervised-ai / Natural-GaLore
An extention to the GaLore paper, to perform Natural Gradient Descent in low rank subspace
☆18Updated last year