sciai-lab / Truth_is_UniversalLinks
☆28Updated last year
Alternatives and similar repositories for Truth_is_Universal
Users that are interested in Truth_is_Universal are comparing it to the libraries listed below
Sorting:
- ☆57Updated 2 years ago
- Materials for EACL2024 tutorial: Transformer-specific Interpretability☆60Updated last year
- ☆156Updated 2 years ago
- Conformal Language Modeling☆32Updated last year
- Trains Sparse Autoencoders based on outputs from language models☆11Updated last year
- A resource repository for representation engineering in large language models☆140Updated last year
- ☆136Updated this week
- Sparse probing paper full code.☆65Updated last year
- ☆102Updated last year
- ☆19Updated 3 months ago
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)☆75Updated last year
- ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.☆150Updated 3 months ago
- ☆40Updated last year
- ☆179Updated last year
- [ICLR 2025] General-purpose activation steering library☆119Updated 2 months ago
- Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the p…☆12Updated 9 months ago
- This repository collects all relevant resources about interpretability in LLMs☆382Updated last year
- ☆46Updated last year
- AI Logging for Interpretability and Explainability🔬☆133Updated last year
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces☆99Updated 2 years ago
- How do transformer LMs encode relations?☆55Updated last year
- Layer-wise Relevance Propagation for Large Language Models and Vision Transformers [ICML 2024]☆203Updated 4 months ago
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆141Updated 4 months ago
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆84Updated 8 months ago
- Steering Llama 2 with Contrastive Activation Addition☆193Updated last year
- "Understanding Dataset Difficulty with V-Usable Information" (ICML 2022, outstanding paper)☆88Updated 2 years ago
- ☆195Updated last month
- ☆60Updated 3 months ago
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆67Updated 11 months ago
- Evaluate interpretability methods on localizing and disentangling concepts in LLMs.☆56Updated 3 weeks ago