pilancilab / calderaLinks
Compressing Large Language Models using Low Precision and Low Rank Decomposition
☆106Updated 3 weeks ago
Alternatives and similar repositories for caldera
Users that are interested in caldera are comparing it to the libraries listed below
Sorting:
- ☆159Updated 5 months ago
- ☆204Updated last year
- QuIP quantization☆61Updated last year
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆130Updated last year
- PB-LLM: Partially Binarized Large Language Models☆157Updated 2 years ago
- Fast and memory-efficient exact attention☆74Updated 9 months ago
- Work in progress.☆75Updated 3 weeks ago
- This repository contains code for the MicroAdam paper.☆21Updated last year
- [ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆317Updated 3 weeks ago
- Code for studying the super weight in LLM☆121Updated last year
- ☆81Updated last year
- Token Omission Via Attention☆128Updated last year
- ☆70Updated last year
- FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation☆51Updated 3 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆155Updated last year
- The evaluation framework for training-free sparse attention in LLMs☆106Updated 2 months ago
- ☆113Updated last month
- ☆66Updated 6 months ago
- [ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs☆227Updated 11 months ago
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆249Updated 10 months ago
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆111Updated last year
- [NAACL 2025] Official Implementation of "HMT: Hierarchical Memory Transformer for Long Context Language Processing"☆77Updated 5 months ago
- Normalized Transformer (nGPT)☆193Updated last year
- Activation-aware Singular Value Decomposition for Compressing Large Language Models☆82Updated last year
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆233Updated 2 months ago
- ☆205Updated last year
- [ICML 2024] Official Implementation of SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks☆37Updated 10 months ago
- ☆155Updated 10 months ago
- Low-Rank Llama Custom Training☆23Updated last year
- The homepage of OneBit model quantization framework.☆196Updated 10 months ago