catherinesyeh / attention-vizLinks
Visualizing query-key interactions in language + vision transformers (VIS 2023)
☆154Updated last year
Alternatives and similar repositories for attention-viz
Users that are interested in attention-viz are comparing it to the libraries listed below
Sorting:
- Extracting spatial and temporal world models from LLMs☆257Updated 2 years ago
- Website for hosting the Open Foundation Models Cheat Sheet.☆267Updated 5 months ago
- ☆142Updated last month
- Emergent world representations: Exploring a sequence model trained on a synthetic task☆191Updated 2 years ago
- ☆109Updated 8 months ago
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆192Updated last year
- Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmind☆177Updated last year
- ☆38Updated last year
- Scaling Data-Constrained Language Models☆342Updated 3 months ago
- Notebooks accompanying Anthropic's "Toy Models of Superposition" paper☆129Updated 3 years ago
- RuLES: a benchmark for evaluating rule-following in language models☆238Updated 7 months ago
- ☆149Updated last year
- Code repository for Black Mamba☆257Updated last year
- LLM-Merging: Building LLMs Efficiently through Merging☆203Updated last year
- ☆128Updated last year
- Tools for understanding how transformer predictions are built layer-by-layer☆532Updated 2 months ago
- Official Repository of Pretraining Without Attention (BiGS), BiGS is the first model to achieve BERT-level transfer learning on the GLUE …☆114Updated last year
- Repository for code used in the xVal paper☆144Updated last year
- This is the repo for the paper Shepherd -- A Critic for Language Model Generation☆217Updated 2 years ago
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆221Updated 10 months ago
- ☆267Updated 8 months ago
- A curated reading list of research in Adaptive Computation, Inference-Time Computation & Mixture of Experts (MoE).☆154Updated 9 months ago
- Understand and test language model architectures on synthetic tasks.☆233Updated 3 weeks ago
- Editing Models with Task Arithmetic☆508Updated last year
- TART: A plug-and-play Transformer module for task-agnostic reasoning☆200Updated 2 years ago
- ☆134Updated last year
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆233Updated 2 months ago
- Evaluating LLMs with fewer examples☆163Updated last year
- A mechanistic approach for understanding and detecting factual errors of large language models.☆46Updated last year
- ☆253Updated 6 months ago