yifan-h / MechanisticProbe
Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models
☆12Updated last year
Alternatives and similar repositories for MechanisticProbe:
Users that are interested in MechanisticProbe are comparing it to the libraries listed below
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆66Updated 6 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆115Updated 5 months ago
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆42Updated 3 months ago
- ☆28Updated 3 months ago
- [ACL 2024] Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models☆16Updated 7 months ago
- This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…☆23Updated 5 months ago
- ☆38Updated 3 months ago
- Code for the ICML 2024 paper "Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment"☆55Updated last month
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)☆52Updated 4 months ago
- Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering☆55Updated 2 months ago
- This is a unified platform for performing prompting engineering in large language models (LLMs).☆12Updated last month
- [ICML 2024] Language Models Represent Beliefs of Self and Others☆31Updated 4 months ago
- [NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct☆155Updated last month
- Rewarded soups official implementation☆55Updated last year
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆42Updated 6 months ago
- Code for the paper <SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning>☆48Updated last year
- Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"☆100Updated 11 months ago
- Critique-out-Loud Reward Models☆52Updated 4 months ago
- ☆41Updated 3 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆95Updated 4 months ago
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆52Updated 2 months ago
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy☆48Updated 2 months ago
- [EMNLP Findings 2024 & ACL 2024 NLRSE Oral] Enhancing Mathematical Reasoning in Language Models with Fine-grained Rewards☆48Updated 9 months ago
- A Survey on the Honesty of Large Language Models☆53Updated 2 months ago
- This my attempt to create Self-Correcting-LLM based on the paper Training Language Models to Self-Correct via Reinforcement Learning by g…☆29Updated last month
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆90Updated last year
- Source code for EMNLP2022 paper "Finding Skill Neurons in Pre-trained Transformers via Prompt Tuning".☆18Updated last year
- ☆20Updated 7 months ago
- Implementation of the MATRIX framework (ICML 2024)☆45Updated 9 months ago