GuoTianYu2000 / Active-Dormant-AttentionView external linksLinks
codes and plots for "Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs"
☆10Dec 30, 2024Updated last year
Alternatives and similar repositories for Active-Dormant-Attention
Users that are interested in Active-Dormant-Attention are comparing it to the libraries listed below
Sorting:
- Unofficial Implementation of Selective Attention Transformer☆20Oct 31, 2024Updated last year
- ☆25Jun 29, 2025Updated 7 months ago
- This is the official implementation for our ACL 2024 paper: "Causal Estimation of Memorisation Profiles".☆24Mar 25, 2025Updated 10 months ago
- Flash Attention in 300-500 lines of CUDA/C++☆36Aug 22, 2025Updated 5 months ago
- Example code of Sparse Gaussian Process Attention (ICLR 2023)☆26Sep 15, 2025Updated 5 months ago
- Multi-Layer Sparse Autoencoders (ICLR 2025)☆29Feb 6, 2026Updated last week
- ☆35Feb 26, 2024Updated last year
- The official implementation of 《MLLMs-Augmented Visual-Language Representation Learning》☆31Mar 12, 2024Updated last year
- Codebase for fine-tuning Llama2 70B to generate math test questions and answers.☆11Aug 30, 2024Updated last year
- Concurrency library☆16Oct 13, 2024Updated last year
- This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…☆39Sep 22, 2024Updated last year
- Code for the paper "Distinguishing the Knowable from the Unknowable with Language Models"☆11Apr 15, 2024Updated last year
- Repo for paper "CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models".☆12Oct 14, 2024Updated last year
- ☆11Dec 23, 2024Updated last year
- Exploring the minimal architecture required for coherent English language generation.☆12Mar 5, 2025Updated 11 months ago
- Develop C++/CUDA extensions with PyTorch like Python scripts☆10Jan 7, 2026Updated last month
- Models for packages and the resources they contain.☆14Mar 10, 2024Updated last year
- Python Inference Script(PyIS)☆19Aug 30, 2022Updated 3 years ago
- CANdle - a library for using USB-FDCAN dongle and communicating with md80 drives☆14Sep 15, 2025Updated 5 months ago
- An active inference model of Lacanian psychoanalysis☆15Jun 7, 2025Updated 8 months ago
- [AAAI2024] An official pytorch implement of the paper: Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Underst…☆13Dec 8, 2024Updated last year
- Material parsers and other tools, scripts Initially developed for Grobid Superconductor☆13Feb 21, 2025Updated 11 months ago
- Original VinVL visual backbone with simplified APIs to easily extract features, boxes, object detections, in a few lines of Python code.☆11Nov 27, 2022Updated 3 years ago
- ☆10Apr 7, 2024Updated last year
- A framework for steering MoE models by detecting and controlling behavior-linked experts.☆29Sep 12, 2025Updated 5 months ago
- The implementation for ICLR 2025 Oral: From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions.☆53Aug 9, 2025Updated 6 months ago
- Efficiently Composable Data Augmentation on the GPU with Jax☆42May 16, 2025Updated 9 months ago
- 🧩 Design-Information-Modeling for Kit-of-Parts 🏘️☆16Updated this week
- ☆13Nov 27, 2025Updated 2 months ago
- text-only training or language-free training for multimodal tasks (image/audio/video caption, retrieval, text2image)☆12Oct 15, 2024Updated last year
- Interactive, GPU accelerated computation graphs☆12Nov 21, 2024Updated last year
- Official Implementation of "The Graph Database Interface: Scaling Online Transactional and Analytical Graph Workloads to Hundreds of Thou…☆14Jul 2, 2025Updated 7 months ago
- ☆11Jan 3, 2024Updated 2 years ago
- CRISPR, faster, better – The Crackling method for whole-genome target detection☆10Jan 11, 2024Updated 2 years ago
- A thread-safe vector database for model inference inside LMDB.☆15Updated this week
- TiC: Exploring Vision Transformer in Convolution☆11Oct 24, 2023Updated 2 years ago
- Learning to Skip the Middle Layers of Transformers☆17Aug 7, 2025Updated 6 months ago
- SketchINR: A First Look into Sketches as Implicit Neural Representations [CVPR 2024]☆12Aug 19, 2024Updated last year
- [NeurIPS 2024] CHASE: Learning Convex Hull Adaptive Shift for Skeleton-based Multi-Entity Action Recognition☆16Nov 12, 2025Updated 3 months ago