Interpretating the latent space representations of attention head outputs for LLMs
☆39Aug 13, 2024Updated last year
Alternatives and similar repositories for AttentionLens
Users that are interested in AttentionLens are comparing it to the libraries listed below
Sorting:
- Materials for "Multi-property Steering of Large Language Models with Dynamic Activation Composition"☆14Nov 22, 2024Updated last year
- PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)☆20Jan 19, 2025Updated last year
- Tools for understanding how transformer predictions are built layer-by-layer☆570Aug 7, 2025Updated 7 months ago
- Repository of IPBench☆19Jan 4, 2026Updated 2 months ago
- ☆14Apr 29, 2025Updated 10 months ago
- see github.com/understanding-search/maze-transformer☆10Dec 8, 2023Updated 2 years ago
- ☆15May 26, 2025Updated 9 months ago
- Data and Code for Paper "Reflect Not Reflex: Inference-Based Common Ground Improves Dialogue Response Quality" (EMNLP 2022)☆11Nov 28, 2022Updated 3 years ago
- Code for EMNLP 2021 paper "Measuring Association Between Labels and Free-Text Rationales"☆12Sep 12, 2023Updated 2 years ago
- This project collects methods that enhance the comparison between AMR graphs.☆11Jun 15, 2023Updated 2 years ago
- Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"☆13Jul 18, 2024Updated last year
- Benchmarking LLM Inference Speeds☆13Feb 4, 2026Updated last month
- A weak supervision framework for (partial) labeling functions☆16Jul 15, 2024Updated last year
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆40Jun 10, 2024Updated last year
- ☆13May 26, 2022Updated 3 years ago
- Localizing Memorized Sequences in Language Models☆20Oct 15, 2025Updated 4 months ago
- Code for our paper Resources and Evaluations for Multi-Distribution Dense Information Retrieval☆16Jan 16, 2024Updated 2 years ago
- Algebraic value editing in pretrained language models☆69Nov 1, 2023Updated 2 years ago
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces☆102Sep 21, 2023Updated 2 years ago
- Analyzing LLM Alignment via Token distribution shift☆17Jan 26, 2024Updated 2 years ago
- ☆52Oct 23, 2023Updated 2 years ago
- Code for Evaluating Explanations for Reading Comprehension with Realistic Counterfactuals.☆18Apr 25, 2021Updated 4 years ago
- This project collects methods that enhance the comparison between AMR graphs.☆18Jun 15, 2023Updated 2 years ago
- ☆18Oct 6, 2022Updated 3 years ago
- ☆19Oct 2, 2023Updated 2 years ago
- ☆26Apr 11, 2023Updated 2 years ago
- Official code implementation for the ACL 2025 paper: 'Dynamic Scaling of Unit Tests for Code Reward Modeling'☆27May 16, 2025Updated 9 months ago
- ☆23Jun 13, 2024Updated last year
- ☆25Dec 20, 2023Updated 2 years ago
- The 🌟ANITA project🌟 *(Advanced Natural-based interaction for the ITAlian language)* wants to provide Italian NLP researchers with an im…☆24Sep 11, 2024Updated last year
- Easy-to-use MIRAGE code for faithful answer attribution in RAG applications. Paper: https://aclanthology.org/2024.emnlp-main.347/☆26Mar 10, 2025Updated 11 months ago
- Code for the paper "Spectral Editing of Activations for Large Language Model Alignments"☆29Dec 20, 2024Updated last year
- Erasing concepts from neural representations with provable guarantees☆243Jan 27, 2025Updated last year
- This is the official repository for the "Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP" paper acce…☆25Feb 16, 2026Updated 2 weeks ago
- Code for T-MARS data filtering☆35Aug 23, 2023Updated 2 years ago
- https://pypi.org/project/intent-suggestions/☆10Sep 6, 2022Updated 3 years ago
- This is the official PyTorch repo for "UNIREX: A Unified Learning Framework for Language Model Rationale Extraction" (ICML 2022).☆27Feb 14, 2023Updated 3 years ago
- ☆30Aug 8, 2021Updated 4 years ago
- This repo is built to facilitate the training and analysis of autoregressive transformers on maze-solving tasks.☆34Oct 28, 2025Updated 4 months ago