mattneary / attention
visualizing attention for LLM users
☆162Updated last year
Related projects ⓘ
Alternatives and complementary repositories for attention
- Improving Alignment and Robustness with Circuit Breakers☆152Updated last month
- Controlled Text Generation via Language Model Arithmetic☆211Updated last month
- Evaluating LLMs with fewer examples☆134Updated 7 months ago
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆158Updated last month
- Scripts for generating synthetic finetuning data for reducing sycophancy.☆106Updated last year
- Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering☆143Updated 3 weeks ago
- A toolkit for describing model features and intervening on those features to steer behavior.☆74Updated this week
- Function Vectors in Large Language Models (ICLR 2024)☆118Updated last month
- Extract full next-token probabilities via language model APIs☆228Updated 8 months ago
- ☆124Updated 6 months ago
- Erasing concepts from neural representations with provable guarantees☆208Updated 3 weeks ago
- Mass-editing thousands of facts into a transformer memory (ICLR 2023)☆434Updated 9 months ago
- Code for reproducing our paper "Not All Language Model Features Are Linear"☆60Updated last month
- Scaling Data-Constrained Language Models☆321Updated last month
- Code repository for the c-BTM paper☆105Updated last year
- What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets☆188Updated 2 months ago
- Sparse autoencoders☆336Updated 3 weeks ago
- ☆105Updated this week
- PASTA: Post-hoc Attention Steering for LLMs☆107Updated 2 months ago
- The GitHub repo for Goal Driven Discovery of Distributional Differences via Language Descriptions☆68Updated last year
- [ICLR 2024 Spotlight] FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets☆210Updated 10 months ago
- ☆447Updated 2 weeks ago
- A package to generate summaries of long-form text and evaluate the coherence of these summaries. Official package for our ICLR 2024 paper…☆105Updated last month
- Inference-Time Intervention: Eliciting Truthful Answers from a Language Model☆464Updated last month
- Steering Llama 2 with Contrastive Activation Addition☆95Updated 5 months ago
- Code accompanying "How I learned to start worrying about prompt formatting".☆92Updated last month
- Code and data accompanying our paper on arXiv "Faithful Chain-of-Thought Reasoning".☆156Updated 6 months ago
- ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward exp…☆216Updated 7 months ago
- [ICLR 2024 & NeurIPS 2023 WS] An Evaluator LM that is open-source, offers reproducible evaluation, and inexpensive to use. Specifically d…☆287Updated last year
- Implementation of paper Data Engineering for Scaling Language Models to 128K Context☆435Updated 7 months ago