locuslab / massive-activationsLinks
Code accompanying the paper "Massive Activations in Large Language Models"
☆163Updated last year
Alternatives and similar repositories for massive-activations
Users that are interested in massive-activations are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆88Updated 8 months ago
- [ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"☆85Updated this week
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality☆200Updated 10 months ago
- [NeurIPS'24 Spotlight] Observational Scaling Laws☆55Updated 8 months ago
- Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"☆123Updated last year
- ☆180Updated 2 months ago
- ☆183Updated last year
- Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.☆421Updated last year
- A Sober Look at Language Model Reasoning☆74Updated last week
- [ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…☆41Updated 11 months ago
- [EMNLP 2023, Main Conference] Sparse Low-rank Adaptation of Pre-trained Language Models☆78Updated last year
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆163Updated last year
- TokenSkip: Controllable Chain-of-Thought Compression in LLMs☆156Updated 3 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆138Updated 9 months ago
- Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"☆28Updated last year
- ☆96Updated 9 months ago
- ☆37Updated 9 months ago
- ☆18Updated 6 months ago
- Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch☆175Updated this week
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…☆108Updated 9 months ago
- The Paper List on Data Contamination for Large Language Models Evaluation.☆95Updated 2 months ago
- Code and Data for "Long-context LLMs Struggle with Long In-context Learning" [TMLR2025]☆105Updated 4 months ago
- ☆29Updated last year
- Source code of "Task arithmetic in the tangent space: Improved editing of pre-trained models".☆102Updated 2 years ago
- Code release for Dataless Knowledge Fusion by Merging Weights of Language Models (https://openreview.net/forum?id=FCnohuR6AnM)☆89Updated last year
- Activation-aware Singular Value Decomposition for Compressing Large Language Models☆71Updated 8 months ago
- ☆55Updated 6 months ago
- Code associated with Tuning Language Models by Proxy (Liu et al., 2024)☆112Updated last year
- ☆69Updated 3 years ago
- ☆126Updated last year