IINemo / lm-polygraphLinks
β369Updated this week
Alternatives and similar repositories for lm-polygraph
Users that are interested in lm-polygraph are comparing it to the libraries listed below
Sorting:
- Interpretability for sequence generation models π πβ441Updated last month
- For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.β154Updated this week
- Steering Llama 2 with Contrastive Activation Additionβ188Updated last year
- β233Updated last year
- Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steeringβ190Updated 8 months ago
- AI Logging for Interpretability and Explainabilityπ¬β129Updated last year
- β116Updated last year
- This repository collects all relevant resources about interpretability in LLMsβ374Updated 11 months ago
- [ICLR 2025] General-purpose activation steering libraryβ108Updated last month
- How do transformer LMs encode relations?β55Updated last year
- Steering vectors for transformer language models in Pytorch / Huggingfaceβ125Updated 7 months ago
- Stanford NLP Python library for understanding and improving PyTorch models via interventionsβ819Updated last month
- β131Updated last week
- Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".β283Updated 4 months ago
- β175Updated 11 months ago
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spacesβ98Updated 2 years ago
- What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasetsβ223Updated 11 months ago
- A resource repository for representation engineering in large language modelsβ138Updated 11 months ago
- β56Updated 2 years ago
- Codebase for reproducing the experiments of the semantic uncertainty paper (short-phrase and sentence-length experiments).β373Updated last year
- β98Updated last year
- β190Updated this week
- List of papers on hallucination detection in LLMs.β969Updated 4 months ago
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models β¦β216Updated last week
- Using sparse coding to find distributed representations used by neural networks.β274Updated last year
- β26Updated 8 months ago
- β202Updated 10 months ago
- SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Modelsβ569Updated last year
- Unified access to Large Language Model modules using NNsightβ49Updated last week
- Materials for EACL2024 tutorial: Transformer-specific Interpretabilityβ60Updated last year