IINemo / lm-polygraphLinks
☆393Updated this week
Alternatives and similar repositories for lm-polygraph
Users that are interested in lm-polygraph are comparing it to the libraries listed below
Sorting:
- Interpretability for sequence generation models 🐛 🔍☆447Updated last month
- Steering vectors for transformer language models in Pytorch / Huggingface☆130Updated 9 months ago
- This repository collects all relevant resources about interpretability in LLMs☆384Updated last year
- Steering Llama 2 with Contrastive Activation Addition☆194Updated last year
- How do transformer LMs encode relations?☆56Updated last year
- ☆136Updated last week
- Unified access to Large Language Model modules using NNsight☆60Updated last week
- ☆196Updated last month
- Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering☆192Updated 9 months ago
- [ICLR 2025] General-purpose activation steering library☆120Updated 2 months ago
- AI Logging for Interpretability and Explainability🔬☆133Updated last year
- ☆189Updated last year
- Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".☆302Updated 5 months ago
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces☆99Updated 2 years ago
- ☆237Updated last year
- Stanford NLP Python library for understanding and improving PyTorch models via interventions☆834Updated last month
- Using sparse coding to find distributed representations used by neural networks.☆286Updated 2 years ago
- Performant framework for training, analyzing and visualizing Sparse Autoencoders (SAEs) and their frontier variants.☆164Updated this week
- ☆116Updated last year
- What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets☆224Updated last year
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆145Updated 5 months ago
- ☆57Updated 2 years ago
- List of papers on hallucination detection in LLMs.☆991Updated 2 weeks ago
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …☆228Updated this week
- ☆216Updated last year
- Sparse probing paper full code.☆65Updated last year
- A resource repository for representation engineering in large language models☆141Updated last year
- Sparse Autoencoder for Mechanistic Interpretability☆284Updated last year
- Repository for the Bias Benchmark for QA dataset.☆132Updated last year
- Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆116Updated 9 months ago