[ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration
☆46Jun 30, 2024Updated last year
Alternatives and similar repositories for ACT
Users that are interested in ACT are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆156Jul 8, 2025Updated 7 months ago
- [ACL 2025 Findings] Official pytorch implementation of "Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vis…☆24Jul 21, 2024Updated last year
- [ICLR 2025] See What You Are Told: Visual Attention Sink in Large Multimodal Models☆93Feb 16, 2025Updated last year
- Code accompanying the paper "Massive Activations in Large Language Models"☆196Mar 4, 2024Updated last year
- Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity☆22Aug 28, 2025Updated 6 months ago
- The official code for [ECCV2020] "HALO: Hardware-aware Learning to Optimize"☆10Mar 22, 2023Updated 2 years ago
- [AAAI 2025] Code for paper:Enhancing Multimodal Large Language Models Complex Reasoning via Similarity Computation☆28Jan 14, 2025Updated last year
- [ICLR 2025] Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better☆16Feb 15, 2025Updated last year
- ☆10Oct 28, 2024Updated last year
- ☆66Jan 23, 2026Updated last month
- Data and code for the paper: Finding Safety Neurons in Large Language Models☆21Jan 29, 2026Updated last month
- [ICLR '25] Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations"☆95Nov 30, 2025Updated 3 months ago
- [EMNLP 2024] Quantize LLM to extremely low-bit, and finetune the quantized LLMs☆15Jul 18, 2024Updated last year
- ☆15Apr 2, 2025Updated 11 months ago
- [CVPR 2025 Highlight] Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding☆61Aug 31, 2025Updated 6 months ago
- [CVPR 2023] Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference☆30Mar 14, 2024Updated last year
- SysBench: Can Large Language Models Follow System Messages?☆39Sep 4, 2024Updated last year
- ☆13Jun 26, 2024Updated last year
- ☆13Apr 24, 2022Updated 3 years ago
- Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models☆15Nov 4, 2023Updated 2 years ago
- [ECCV24] MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization☆14Nov 27, 2024Updated last year
- [CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models☆108May 29, 2025Updated 9 months ago
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization☆38Sep 24, 2024Updated last year
- Official PyTorch implementation of Rethinking Guidance Information to Utilize Unlabeled Samples: A Label-Encoding Perspective.☆19Sep 27, 2024Updated last year
- [NeurIPS '25] Multi-Token Prediction Needs Registers☆27Dec 14, 2025Updated 2 months ago
- [AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…☆44Apr 18, 2025Updated 10 months ago
- code for Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning☆20Jul 16, 2024Updated last year
- The official implementation of the DAC 2024 paper GQA-LUT☆20Dec 20, 2024Updated last year
- Pytorch Implementation for "Preserving Linear Separability in Continual Learning by Backward Feature Projection" (CVPR 2023)☆18Jun 29, 2023Updated 2 years ago
- [NeurIPS 2025] Official repository for “FlowCut: Rethinking Redundancy via Information Flow for Efficient Vision-Language Models”☆28Dec 9, 2025Updated 2 months ago
- LoFiT: Localized Fine-tuning on LLM Representations☆44Jan 15, 2025Updated last year
- ☆47Nov 8, 2024Updated last year
- 🍼 Official implementation of Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts☆41Sep 29, 2024Updated last year
- Code for the EMNLP24 paper "A simple and effective L2 norm based method for KV Cache compression."☆18Dec 13, 2024Updated last year
- [ICCV 2025] Official implementation of "InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models"☆53Feb 10, 2025Updated last year
- ☆23Mar 18, 2024Updated last year
- ☆22Feb 29, 2024Updated 2 years ago
- Are gradient information useful for pruning of LLMs?☆47Aug 23, 2025Updated 6 months ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆48Jan 17, 2024Updated 2 years ago