IINemo / lm-polygraph
☆218Updated this week
Alternatives and similar repositories for lm-polygraph:
Users that are interested in lm-polygraph are comparing it to the libraries listed below
- For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.☆91Updated this week
- Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering☆160Updated this week
- AI Logging for Interpretability and Explainability🔬☆102Updated 8 months ago
- Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆82Updated this week
- Steering Llama 2 with Contrastive Activation Addition☆122Updated 8 months ago
- Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".☆179Updated 4 months ago
- How do transformer LMs encode relations?☆46Updated 11 months ago
- ☆149Updated this week
- Using sparse coding to find distributed representations used by neural networks.☆213Updated last year
- ☆189Updated 11 months ago
- Improving Alignment and Robustness with Circuit Breakers☆181Updated 4 months ago
- Function Vectors in Large Language Models (ICLR 2024)☆137Updated 4 months ago
- Sparse probing paper full code.☆53Updated last year
- A package to generate summaries of long-form text and evaluate the coherence of these summaries. Official package for our ICLR 2024 paper…☆115Updated 4 months ago
- ☆46Updated last year
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆182Updated 2 months ago
- ☆203Updated 4 months ago
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces☆87Updated last year
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆71Updated last year
- ☆109Updated 6 months ago
- BABILong is a benchmark for LLM evaluation using the needle-in-a-haystack approach.☆184Updated 2 months ago
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆182Updated 2 months ago
- ☆174Updated 2 years ago
- A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic…☆318Updated 8 months ago
- This repository collects all relevant resources about interpretability in LLMs☆316Updated 3 months ago
- Sparse Autoencoder for Mechanistic Interpretability☆215Updated 6 months ago
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"☆83Updated 6 months ago
- Github repository for "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models"☆145Updated 2 months ago
- Mass-editing thousands of facts into a transformer memory (ICLR 2023)☆462Updated last year
- ☆89Updated last year