catherinesyeh / attention-viz
Visualizing query-key interactions in language + vision transformers
☆144Updated last year
Alternatives and similar repositories for attention-viz
Users that are interested in attention-viz are comparing it to the libraries listed below
Sorting:
- ☆150Updated last year
- Scaling Data-Constrained Language Models☆334Updated 7 months ago
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆189Updated 11 months ago
- Language models scale reliably with over-training and on downstream tasks☆97Updated last year
- TART: A plug-and-play Transformer module for task-agnostic reasoning☆197Updated last year
- Emergent world representations: Exploring a sequence model trained on a synthetic task☆181Updated last year
- ☆120Updated 7 months ago
- This is the official repository for Inheritune.☆111Updated 3 months ago
- Tools for understanding how transformer predictions are built layer-by-layer☆490Updated 11 months ago
- Understand and test language model architectures on synthetic tasks.☆195Updated 2 months ago
- Official Repository of Pretraining Without Attention (BiGS), BiGS is the first model to achieve BERT-level transfer learning on the GLUE …☆115Updated last year
- ☆189Updated this week
- Evaluating LLMs with fewer examples☆153Updated last year
- Function Vectors in Large Language Models (ICLR 2024)☆166Updated last month
- ☆94Updated 3 months ago
- ☆180Updated last year
- Code repository for Black Mamba☆246Updated last year
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆201Updated 5 months ago
- Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmind☆176Updated 8 months ago
- Small and Efficient Mathematical Reasoning LLMs☆71Updated last year
- ☆129Updated last month
- Implementation of Recurrent Memory Transformer, Neurips 2022 paper, in Pytorch☆407Updated 4 months ago
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆199Updated last week
- A curated reading list of research in Adaptive Computation, Inference-Time Computation & Mixture of Experts (MoE).☆144Updated 4 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆115Updated 5 months ago
- Implementation of 🌻 Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorch☆89Updated last year
- PASTA: Post-hoc Attention Steering for LLMs☆117Updated 5 months ago
- This is the repo for the paper Shepherd -- A Critic for Language Model Generation☆219Updated last year
- ☆85Updated last year
- RuLES: a benchmark for evaluating rule-following in language models☆223Updated 2 months ago