LiuAmber / RAHF
☆21Updated 6 months ago
Alternatives and similar repositories for RAHF:
Users that are interested in RAHF are comparing it to the libraries listed below
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆56Updated 11 months ago
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆52Updated 4 months ago
- LoFiT: Localized Fine-tuning on LLM Representations☆34Updated 2 months ago
- Code associated with Tuning Language Models by Proxy (Liu et al., 2024)☆107Updated 11 months ago
- FeatureAlignment = Alignment + Mechanistic Interpretability☆28Updated 2 weeks ago
- ☆73Updated 10 months ago
- Code & Data for our Paper "Alleviating Hallucinations of Large Language Models through Induced Hallucinations"☆63Updated last year
- Official code for ICML 2024 paper on Persona In-Context Learning (PICLe)☆23Updated 9 months ago
- ☆43Updated 5 months ago
- A Survey on the Honesty of Large Language Models☆56Updated 3 months ago
- ☆81Updated 2 months ago
- [NeurIPS 2024] How do Large Language Models Handle Multilingualism?☆29Updated 4 months ago
- A versatile toolkit for applying Logit Lens to modern large language models (LLMs). Currently supports Llama-3.1-8B and Qwen-2.5-7B, enab…☆61Updated last month
- ☆29Updated 10 months ago
- [EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".☆73Updated 2 months ago
- code for EMNLP 2024 paper: Neuron-Level Knowledge Attribution in Large Language Models