codes and plots for "Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs"
☆10Dec 30, 2024Updated last year
Alternatives and similar repositories for Active-Dormant-Attention
Users that are interested in Active-Dormant-Attention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Example code of Sparse Gaussian Process Attention (ICLR 2023)☆26Sep 15, 2025Updated 6 months ago
- Multi-Layer Sparse Autoencoders (ICLR 2025)☆29Feb 6, 2026Updated last month
- Unofficial Implementation of Selective Attention Transformer☆21Oct 31, 2024Updated last year
- Dataset and pre-trained model of EMNLP-IJCNLP 2019 paper "TalkDown: A Corpus for Condescension Detection in Context."☆10Jan 26, 2020Updated 6 years ago
- Everyone loves OS☆19Mar 3, 2026Updated 3 weeks ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- This is the official code repository for the paper NeurIPS 2024 spotlight paper "Kermut: Composite kernel regression for protein variant …☆44Aug 5, 2025Updated 7 months ago
- The official implementation of HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization☆18Mar 7, 2025Updated last year
- Code for Augment & Reduce, a scalable stochastic algorithm for large categorical distributions☆10May 16, 2018Updated 7 years ago
- ☆15Jul 13, 2025Updated 8 months ago
- Official Pytorch implementation of Chromatic Graph Transformers☆10Jun 14, 2023Updated 2 years ago
- Clustered Compositional Embeddings☆11Oct 25, 2023Updated 2 years ago
- This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…☆39Sep 22, 2024Updated last year
- Ἀνατομή is a PyTorch library to analyze representation of neural networks☆13Jan 31, 2024Updated 2 years ago
- A framework for steering MoE models by detecting and controlling behavior-linked experts.☆30Sep 12, 2025Updated 6 months ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- Learning to Skip the Middle Layers of Transformers☆17Aug 7, 2025Updated 7 months ago
- Don't just regulate gradients like in Muon, regulate the weights too☆32Jul 30, 2025Updated 8 months ago
- Implementation of Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems☆14Nov 11, 2023Updated 2 years ago
- A Zen approach to configuring your Python project☆16Feb 27, 2026Updated last month
- ☆12Sep 16, 2024Updated last year
- Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]☆12Nov 8, 2024Updated last year
- Code for the paper "Distinguishing the Knowable from the Unknowable with Language Models"☆11Apr 15, 2024Updated last year
- ☆10Aug 26, 2022Updated 3 years ago
- Jupyter notebooks from our weekly (or so) hackathons☆11Dec 3, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Code for "Exponential Family Estimation via Adversarial Dynamics Embedding" (NeurIPS 2019)☆14Nov 26, 2019Updated 6 years ago
- [ICLR 2025] FLAT: LLM Unlearning via Loss Adjustment with Only Forget Data☆14Feb 26, 2025Updated last year
- This is the official implementation for our ACL 2024 paper: "Causal Estimation of Memorisation Profiles".☆24Mar 25, 2025Updated last year
- Unofficial Scalable-Softmax Is Superior for Attention☆20May 30, 2025Updated 10 months ago
- Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention☆54Oct 16, 2025Updated 5 months ago
- Code and dataset for the EMNLP 2024 paper: GoldCoin: Grounding Large Language Models in Privacy Laws via Contextual Integrity Theory☆49Sep 26, 2024Updated last year
- ☆16Apr 26, 2023Updated 2 years ago
- Flash Attention in 300-500 lines of CUDA/C++☆36Aug 22, 2025Updated 7 months ago
- Code for Semi-crowdsourced Clustering with Deep Generative Models☆12Dec 9, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Code accompanying VarGrad: A Low-Variance Gradient Estimator for Variational Inference☆12Oct 12, 2020Updated 5 years ago
- ☆63Mar 21, 2026Updated last week
- ☆26Jun 29, 2025Updated 9 months ago
- ☆12Jan 17, 2024Updated 2 years ago
- Code for the paper "Bayesian Neural Network Priors Revisited"☆60Jul 1, 2021Updated 4 years ago
- Find context neurons in Pythia models.☆13Jun 13, 2023Updated 2 years ago
- Code for experiments on transformers using Markovian data.☆22Nov 22, 2024Updated last year