locuslab / massive-activations
Code accompanying the paper "Massive Activations in Large Language Models"
☆159Updated last year
Alternatives and similar repositories for massive-activations:
Users that are interested in massive-activations are comparing it to the libraries listed below
- ☆176Updated last year
- [ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"☆82Updated 11 months ago
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality☆189Updated 9 months ago
- [NeurIPS'24 Spotlight] Observational Scaling Laws☆54Updated 7 months ago
- Function Vectors in Large Language Models (ICLR 2024)☆163Updated 2 weeks ago
- [ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models☆88Updated 11 months ago
- ☆93Updated last year
- Source code of "Task arithmetic in the tangent space: Improved editing of pre-trained models".☆101Updated last year
- Activation-aware Singular Value Decomposition for Compressing Large Language Models☆66Updated 6 months ago
- Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"☆123Updated last year
- [ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…☆37Updated 10 months ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆72Updated 6 months ago
- AI Logging for Interpretability and Explainability🔬☆115Updated 10 months ago
- Official Implementation of SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks☆36Updated 3 months ago
- Code and Data for "Long-context LLMs Struggle with Long In-context Learning" [TMLR2025]☆105Updated 2 months ago
- ☆170Updated 2 weeks ago
- ☆52Updated 3 weeks ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆177Updated last month
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆158Updated 10 months ago
- ☆47Updated last year
- ☆256Updated last year
- Repo for ACL2023 Findings paper "Emergent Modularity in Pre-trained Transformers"☆23Updated last year
- [ICML2024 Spotlight] Fine-Tuning Pre-trained Large Language Models Sparsely☆23Updated 10 months ago
- ☆66Updated 3 years ago
- ☆91Updated 7 months ago
- ☆29Updated last year
- Test-time-training on nearest neighbors for large language models☆40Updated last year
- Stick-breaking attention☆52Updated last month
- ☆220Updated 10 months ago
- Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.☆409Updated last year