pilancilab / calderaLinks

Compressing Large Language Models using Low Precision and Low Rank Decomposition

☆104

Alternatives and similar repositories for caldera

Users that are interested in caldera are comparing it to the libraries listed below

Sorting:

IST-DASLab / QuEST
Work in progress.
☆74Updated 4 months ago
Cornell-RelaxML / qtip
☆152Updated 4 months ago
sebulo / LoQT
☆80Updated 11 months ago
PiotrNawrot / sparse-frontier
The evaluation framework for training-free sparse attention in LLMs
☆102Updated 2 weeks ago
xiayuqing0622 / flex_head_fa
Fast and memory-efficient exact attention
☆71Updated 8 months ago
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆130Updated 11 months ago
hahnyuan / PB-LLM
PB-LLM: Partially Binarized Large Language Models
☆156Updated last year
FasterDecoding / BitDelta
☆202Updated 10 months ago
GATECH-EIC / ShiftAddLLM
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization
☆110Updated last year
wdlctc / headinfer
☆58Updated 5 months ago
mengxiayu / LLMSuperWeight
Code for studying the super weight in LLM
☆119Updated 11 months ago
minyoungg / LTE
☆69Updated last year
hahnyuan / ASVD4LLM
Activation-aware Singular Value Decomposition for Compressing Large Language Models
☆80Updated last year
chu-tianxiang / QuIP-for-all
QuIP quantization
☆59Updated last year
schwartz-lab-NLP / TOVA
Token Omission Via Attention
☆127Updated last year
BorealisAI / flora-opt
This is the official repository for the paper "Flora: Low-Rank Adapters Are Secretly Gradient Compressors" in ICML 2024.
☆104Updated last year
NolanoOrg / SpectraSuite
☆51Updated last year
VITA-Group / WeLore
From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…
☆51Updated this week
dropbox / low-rank-llama2
Low-Rank Llama Custom Training
☆23Updated last year
nil0x9 / flash-muon
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆197Updated 4 months ago
NVIDIA / ngpt
Normalized Transformer (nGPT)
☆192Updated 11 months ago
insuhan / hyper-attn
☆83Updated last year
thu-ml / low-bit-optimizers
Low-bit optimizers for PyTorch
☆132Updated 2 years ago
kyleliang919 / Super_Muon
☆65Updated 7 months ago
OswaldHe / HMT-pytorch
[NAACL 2025] Official Implementation of "HMT: Hierarchical Memory Transformer for Long Context Language Processing"
☆75Updated 4 months ago
HazyResearch / lolcats
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆248Updated 9 months ago
CASE-Lab-UMD / Unified-MoE-Compression
The official implementation of the paper "Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques (TMLR)".
☆78Updated 7 months ago
jiwonsong-dev / SLEB
[ICML 2024] Official Implementation of SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
☆37Updated 8 months ago
VITA-Group / Q-GaLore
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
☆202Updated last year
IST-DASLab / MicroAdam
This repository contains code for the MicroAdam paper.
☆20Updated 10 months ago