MikaStars39 / FeatureAlignment
FeatureAlignment = Alignment + Mechanistic Interpretability
☆28Updated last month
Alternatives and similar repositories for FeatureAlignment:
Users that are interested in FeatureAlignment are comparing it to the libraries listed below
- Code associated with Tuning Language Models by Proxy (Liu et al., 2024)☆105Updated 11 months ago
- [EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".☆68Updated last month
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆50Updated 11 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.