IINemo / lm-polygraph
☆241Updated last week
Alternatives and similar repositories for lm-polygraph:
Users that are interested in lm-polygraph are comparing it to the libraries listed below
- Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering☆167Updated last month
- ☆78Updated last week
- A toolkit for describing model features and intervening on those features to steer behavior.☆163Updated 4 months ago
- Attribute (or cite) statements generated by LLMs back to in-context information.☆219Updated 5 months ago
- BABILong is a benchmark for LLM evaluation using the needle-in-a-haystack approach.☆193Updated this week
- For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.☆105Updated last week
- Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".☆195Updated 5 months ago
- How do transformer LMs encode relations?☆46Updated last year
- ☆93Updated last year
- AI Logging for Interpretability and Explainability🔬☆108Updated 9 months ago
- A library for efficient patching and automatic circuit discovery.☆59Updated last month
- Interpretability for sequence generation models 🐛 🔍☆410Updated 4 months ago
- ☆23Updated last month
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …☆163Updated this week
- Steering Llama 2 with Contrastive Activation Addition☆131Updated 10 months ago
- Extract full next-token probabilities via language model APIs☆237Updated last year
- A package to generate summaries of long-form text and evaluate the coherence of these summaries. Official package for our ICLR 2024 paper…☆116Updated 5 months ago
- ☆104Updated 10 months ago
- ☆113Updated 7 months ago
- Steering vectors for transformer language models in Pytorch / Huggingface☆90Updated last month
- Improving Alignment and Robustness with Circuit Breakers☆190Updated 6 months ago
- Evaluating LLMs with fewer examples☆147Updated 11 months ago
- ☆47Updated last year
- Evaluate interpretability methods on localizing and disentangling concepts in LLMs.☆42Updated 5 months ago
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆73Updated last year
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces☆91Updated last year
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆52Updated 4 months ago
- Code repository for the paper "Mission: Impossible Language Models."☆50Updated this week
- ☆124Updated this week
- Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".☆71Updated last year