mansicer / self-verificationLinks
☆17Updated last month
Alternatives and similar repositories for self-verification
Users that are interested in self-verification are comparing it to the libraries listed below
Sorting:
- ☆55Updated last year
- Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)☆199Updated 2 years ago
- Rewarded soups official implementation☆62Updated 2 years ago
- GenRM-CoT: Data release for verification rationales☆67Updated last year
- ☆32Updated last year
- Code for "Reasoning to Learn from Latent Thoughts"☆124Updated 10 months ago
- Natural Language Reinforcement Learning☆101Updated 6 months ago
- Implementation of ICLR 2025 paper "Q-Adapter: Customizing Pre-trained LLMs to New Preferences with Forgetting Mitigation"☆18Updated last year
- A repo for open research on building large reasoning models☆136Updated last week
- official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…☆70Updated 10 months ago
- ☆67Updated 11 months ago
- Code for the ICML 2024 paper "Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment"☆79Updated 8 months ago
- [arxiv: 2512.19673] Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies☆59Updated this week
- ☆13Updated last year
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆115Updated 2 years ago
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆96Updated last year
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆116Updated 6 months ago
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"☆185Updated 8 months ago
- ☁️ KUMO: Generative Evaluation of Complex Reasoning in Large Language Models☆19Updated 8 months ago
- Resources for the Enigmata Project.☆77Updated 5 months ago
- Code for NeurIPS 2024 paper "Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs"☆46Updated 11 months ago
- Paper collections of the continuous effort start from World Models.☆203Updated last year
- [ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning☆70Updated 6 months ago
- ☆18Updated 2 months ago
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆86Updated 8 months ago
- ☆51Updated last year
- This is an official implementation of the paper ``Building Math Agents with Multi-Turn Iterative Preference Learning'' with multi-turn DP…☆32Updated last year
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆124Updated last year
- Official code for the paper: WALL-E: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents☆56Updated 2 months ago
- [ICLR 2026] Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs☆41Updated 8 months ago