yifan-h / MechanisticProbeLinks
Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models
☆14Updated last year
Alternatives and similar repositories for MechanisticProbe
Users that are interested in MechanisticProbe are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct☆190Updated 9 months ago
 - [NeurIPS 2025] Implementation for the paper "The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning"☆119Updated last week
 - FeatureAlignment = Alignment + Mechanistic Interpretability☆31Updated 7 months ago
 - ☆178Updated 5 months ago
 - Implementation of the MATRIX framework (ICML 2024)☆60Updated last year
 - This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"☆69Updated 6 months ago
 - Benchmarking LLMs' Gaming Ability in Multi-Agent Environments☆88Updated 6 months ago
 - [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆91Updated last year
 - Code for the ICML 2024 paper "Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment"☆77Updated 4 months ago
 - [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆131Updated 7 months ago
 - A comprehensive collection of process reward models.☆115Updated last month
 - [2025-TMLR] A Survey on the Honesty of Large Language Models☆61Updated 10 months ago
 - ☆50Updated last year
 - This my attempt to create Self-Correcting-LLM based on the paper Training Language Models to Self-Correct via Reinforcement Learning by g…☆37Updated 3 months ago
 - Repo for Anonymous purpose, pls don't distribute☆10Updated last year
 - ☆31Updated 5 months ago
 - ☆212Updated 7 months ago
 - AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning☆46Updated 4 months ago
 - Project of ACL 2025 "UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models"☆13Updated 7 months ago
 - ☆67Updated 6 months ago
 - [ICML 2025] Official Implementation of GLIDER☆64Updated 3 weeks ago
 - [AI4MATH@ICML2025] Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs☆40Updated 5 months ago
 - This repository contains a regularly updated paper list for LLMs-reasoning-in-latent-space.☆184Updated last week
 - [AAAI 2025] Neural-Symbolic Collaborative Distillation: Advancing Small Language Models for Complex Reasoning Tasks☆10Updated 4 months ago
 - ☆20Updated 8 months ago
 - Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆56Updated 11 months ago
 - Official code for paper "SPA-RL: Reinforcing LLM Agent via Stepwise Progress Attribution"☆47Updated last month
 - Code for "CREAM: Consistency Regularized Self-Rewarding Language Models", ICLR 2025.☆27Updated 8 months ago
 - [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆66Updated 11 months ago
 - [NeurIPS 25] The official implementation of SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning☆23Updated last month