huashen218 / bidirectional-alignment-reading-list
The Survey Paper of "Bidirectional Human-AI Alignment"
☆25Updated 8 months ago
Alternatives and similar repositories for bidirectional-alignment-reading-list:
Users that are interested in bidirectional-alignment-reading-list are comparing it to the libraries listed below
- A resource repository for representation engineering in large language models☆119Updated 5 months ago
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆72Updated last month
- ☆155Updated 5 months ago
- Modular Pluralism @ EMNLP 2024☆17Updated 7 months ago
- ☆164Updated 10 months ago
- ☆91Updated 2 months ago
- Machine Theory of Mind Reading List. Built upon EMNLP Findings 2023 Paper: Towards A Holistic Landscape of Situated Theory of Mind in Lar…☆128Updated 2 months ago
- This repo contains code for our NeurIPS 2023 spotlight paper: Evaluating and Inducing Personality in Pre-trained Language Models☆52Updated last year
- Code for paper: Aligning Large Language Models with Representation Editing: A Control Perspective☆28Updated 2 months ago
- ☆48Updated last year
- ☆53Updated last year
- Co-Supervised Learning: Improving Weak-to-Strong Generalization with Hierarchical Mixture of Experts☆16Updated last year
- ☆25Updated 11 months ago
- awesome SAE papers☆26Updated 2 months ago
- ☆31Updated last year
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆53Updated 5 months ago
- ☆29Updated 11 months ago
- Public code repo for COLING 2025 paper "Aligning LLMs with Individual Preferences via Interaction"☆26Updated 3 weeks ago
- A library for efficient patching and automatic circuit discovery.☆62Updated 2 months ago
- ☆34Updated last month
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces☆91Updated last year
- Function Vectors in Large Language Models (ICLR 2024)☆161Updated last week
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)☆63Updated 6 months ago
- ☆41Updated last year
- Code for the paper "A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis"☆17Updated 2 months ago
- ☆27Updated last month
- Collection of Reverse Engineering in Large Model☆32Updated 3 months ago
- ☆93Updated last year
- code repo for ICLR 2024 paper "Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs"☆113Updated last year
- ☆40Updated last year