pilancilab / calderaLinks
Compressing Large Language Models using Low Precision and Low Rank Decomposition
☆97Updated 9 months ago
Alternatives and similar repositories for caldera
Users that are interested in caldera are comparing it to the libraries listed below
Sorting:
- ☆149Updated 2 months ago
- Code for studying the super weight in LLM☆117Updated 9 months ago
- [NAACL 2025] Official Implementation of "HMT: Hierarchical Memory Transformer for Long Context Language Processing"☆75Updated 2 months ago
- Work in progress.☆72Updated 2 months ago
- The evaluation framework for training-free sparse attention in LLMs☆91Updated 2 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆129Updated 9 months ago
- PB-LLM: Partially Binarized Large Language Models☆153Updated last year
- Activation-aware Singular Value Decomposition for Compressing Large Language Models☆76Updated 10 months ago
- ☆202Updated 9 months ago
- This repository contains code for the MicroAdam paper.☆19Updated 8 months ago
- Fast and memory-efficient exact attention☆69Updated 6 months ago
- ☆69Updated last year
- QuIP quantization☆58Updated last year
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆111Updated 10 months ago
- Token Omission Via Attention☆128Updated 10 months ago
- This is the official repository for the paper "Flora: Low-Rank Adapters Are Secretly Gradient Compressors" in ICML 2024.☆104Updated last year
- ☆53Updated 10 months ago
- The official implementation of the paper "Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques (TMLR)".☆74Updated 5 months ago
- ☆80Updated 9 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆155Updated 10 months ago
- Normalized Transformer (nGPT)☆187Updated 9 months ago
- Low-Rank Llama Custom Training☆23Updated last year
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆245Updated 7 months ago
- [ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs☆224Updated 7 months ago
- [NeurIPS 24 Spotlight] MaskLLM: Learnable Semi-structured Sparsity for Large Language Models☆177Updated 8 months ago
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆228Updated 4 months ago
- ☆63Updated 5 months ago
- Official Implementation of SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks☆38Updated 7 months ago
- [ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆298Updated 3 months ago
- FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation☆51Updated last week