GuoTianYu2000 / Active-Dormant-AttentionView external linksLinks
codes and plots for "Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs"
☆10Dec 30, 2024Updated last year
Alternatives and similar repositories for Active-Dormant-Attention
Users that are interested in Active-Dormant-Attention are comparing it to the libraries listed below
Sorting:
- Unofficial Implementation of Selective Attention Transformer☆20Oct 31, 2024Updated last year
- ☆25Jun 29, 2025Updated 7 months ago
- This is the official implementation for our ACL 2024 paper: "Causal Estimation of Memorisation Profiles".☆24Mar 25, 2025Updated 10 months ago
- Flash Attention in 300-500 lines of CUDA/C++☆36Aug 22, 2025Updated 5 months ago
- Example code of Sparse Gaussian Process Attention (ICLR 2023)☆26Sep 15, 2025Updated 5 months ago
- Multi-Layer Sparse Autoencoders (ICLR 2025)☆29Feb 6, 2026Updated last week
- ☆35Feb 26, 2024Updated last year
- The official implementation of 《MLLMs-Augmented Visual-Language Representation Learning》☆31Mar 12, 2024Updated last year
- Codebase for fine-tuning Llama2 70B to generate math test questions and answers.☆11Aug 30, 2024Updated last year
- ☆11Dec 23, 2024Updated last year
- Code for the paper "Distinguishing the Knowable from the Unknowable with Language Models"☆11Apr 15, 2024Updated last year
- This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…☆39Sep 22, 2024Updated last year
- Repo for paper "CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models".☆12Oct 14, 2024Updated last year
- Concurrency library☆16Oct 13, 2024Updated last year
- Original VinVL visual backbone with simplified APIs to easily extract features, boxes, object detections, in a few lines of Python code.☆11Nov 27, 2022Updated 3 years ago
- Python Inference Script(PyIS)☆19Aug 30, 2022Updated 3 years ago
- A framework for steering MoE models by detecting and controlling behavior-linked experts.☆29Sep 12, 2025Updated 5 months ago
- An active inference model of Lacanian psychoanalysis☆15Jun 7, 2025Updated 8 months ago
- Material parsers and other tools, scripts Initially developed for Grobid Superconductor☆13Feb 21, 2025Updated 11 months ago
- CANdle - a library for using USB-FDCAN dongle and communicating with md80 drives☆14Sep 15, 2025Updated 5 months ago
- Models for packages and the resources they contain.☆14Mar 10, 2024Updated last year
- [AAAI2024] An official pytorch implement of the paper: Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Underst…☆13Dec 8, 2024Updated last year
- Develop C++/CUDA extensions with PyTorch like Python scripts☆10Jan 7, 2026Updated last month
- Exploring the minimal architecture required for coherent English language generation.☆12Mar 5, 2025Updated 11 months ago
- ☆10Apr 7, 2024Updated last year
- The implementation for ICLR 2025 Oral: From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions.☆53Aug 9, 2025Updated 6 months ago
- Efficiently Composable Data Augmentation on the GPU with Jax☆42May 16, 2025Updated 9 months ago
- This repo is for CaesarNeRF: Calibrated Semantic Representation for Few-Shot Generalizable Neural Rendering.☆14Mar 6, 2024Updated last year
- Interactive, GPU accelerated computation graphs☆12Nov 21, 2024Updated last year
- [IROS2025]Adjacent-view Transformers for Supervised Surround-view Depth Estimation☆14Nov 14, 2025Updated 3 months ago
- R package for metabolic enzyme enrichment anaylsis☆13Oct 24, 2025Updated 3 months ago
- ☆15Dec 8, 2024Updated last year
- Simple MoE - Day 17 of 365 Days of Repos☆16Jan 17, 2025Updated last year
- Pytorch implementation of our paper accepted by ICML 2023 -- "Bi-directional Masks for Efficient N:M Sparse Training"☆12Jun 7, 2023Updated 2 years ago
- Position Coupling: Improving Length Generalization of Arithmetic Transformers Using Task Structure (NeurIPS 2024) + Arithmetic Transfor…☆14Oct 26, 2025Updated 3 months ago
- ☆11Apr 6, 2024Updated last year
- ☆11Jan 19, 2025Updated last year
- A dependency injection library for python, aimed for the least amount of magic.☆12Feb 23, 2022Updated 3 years ago
- triton ver of gqa flash attn, based on the tutorial☆12Aug 4, 2024Updated last year