locuslab / massive-activations
Code accompanying the paper "Massive Activations in Large Language Models"
☆104Updated 6 months ago
Related projects: ⓘ
- ☆136Updated 7 months ago
- [ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"☆63Updated 3 months ago
- ☆94Updated 6 months ago
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)☆77Updated last year
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆62Updated 11 months ago
- Source code of "Task arithmetic in the tangent space: Improved editing of pre-trained models".☆79Updated last year
- Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"☆120Updated 4 months ago
- A curated list of Model Merging methods.☆71Updated this week
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆87Updated 8 months ago
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality☆135Updated last month
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…☆61Updated this week
- This is the official repository for the paper "Flora: Low-Rank Adapters Are Secretly Gradient Compressors" in ICML 2024.☆56Updated 2 months ago
- ☆117Updated 7 months ago
- ☆43Updated 7 months ago
- Code release for Dataless Knowledge Fusion by Merging Weights of Language Models (https://openreview.net/forum?id=FCnohuR6AnM)☆80Updated last year
- AI Logging for Interpretability and Explainability🔬☆74Updated 3 months ago
- [ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models☆59Updated 3 months ago
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"☆46Updated 3 weeks ago
- The source code of the EMNLP 2023 main conference paper: Sparse Low-rank Adaptation of Pre-trained Language Models.☆62Updated 6 months ago
- Language models scale reliably with over-training and on downstream tasks☆91Updated 5 months ago
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆56Updated this week
- ☆24Updated 11 months ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆42Updated last year
- Official implementation of MAIA, A Multimodal Automated Interpretability Agent☆56Updated last month
- Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)☆106Updated this week
- [ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…☆16Updated 2 months ago
- PyTorch implementation of "From Sparse to Soft Mixtures of Experts"☆38Updated last year
- Some preliminary explorations of Mamba's context scaling.☆184Updated 7 months ago
- Code and data for the benchmark "Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Lan…☆30Updated 2 months ago
- ☆34Updated 2 months ago