sciai-lab / Truth_is_UniversalLinks
☆26Updated 10 months ago
Alternatives and similar repositories for Truth_is_Universal
Users that are interested in Truth_is_Universal are comparing it to the libraries listed below
Sorting:
- ☆55Updated 2 years ago
- Sparse probing paper full code.☆60Updated last year
- ☆148Updated 2 years ago
- Materials for EACL2024 tutorial: Transformer-specific Interpretability☆60Updated last year
- [ICLR 2025] General-purpose activation steering library☆102Updated 3 weeks ago
- A resource repository for representation engineering in large language models☆135Updated 10 months ago
- ☆36Updated last year
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆78Updated 6 months ago
- ☆229Updated last year
- Repository for the Bias Benchmark for QA dataset.☆128Updated last year
- Using sparse coding to find distributed representations used by neural networks.☆269Updated last year
- Steering Llama 2 with Contrastive Activation Addition☆180Updated last year
- ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.☆146Updated last month
- ☆174Updated last year
- This repository collects all relevant resources about interpretability in LLMs☆372Updated 10 months ago
- Conformal Language Modeling☆32Updated last year
- ☆100Updated last year
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces☆97Updated 2 years ago
- ☆48Updated last year
- The Prism Alignment Project☆79Updated last year
- Aligning AI With Shared Human Values (ICLR 2021)☆297Updated 2 years ago
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)☆76Updated 11 months ago
- ☆114Updated last year
- Evaluate interpretability methods on localizing and disentangling concepts in LLMs.☆54Updated 11 months ago
- ☆91Updated last year
- Modular Pluralism @ EMNLP 2024☆20Updated last year
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆63Updated 9 months ago
- ☆44Updated last year
- Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the p…☆12Updated 7 months ago
- ☆121Updated this week