yinyueqin / DenseRewardRLHF-PPOLinks
☆19Updated 10 months ago
Alternatives and similar repositories for DenseRewardRLHF-PPO
Users that are interested in DenseRewardRLHF-PPO are comparing it to the libraries listed below
Sorting:
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆48Updated 4 months ago
- ☆117Updated last week
- [NeurIPS'25] The official code of "PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning"☆25Updated 2 months ago
- ☆62Updated last month
- VeriGUI: Verifiable Long-Chain GUI Dataset☆82Updated last month
- ☆52Updated 6 months ago
- Implementation of Negative-aware Finetuning (NFT) algorithm for "Bridging Supervised Learning and Reinforcement Learning in Math Reasonin…☆64Updated 2 months ago
- ☆54Updated last year
- MDPO: Overcoming the Training-Inference Divide of Masked Diffusion Language Models☆37Updated last month
- G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning☆89Updated 6 months ago
- Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning☆49Updated last month
- Official code for the paper: WALL-E: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents☆52Updated last week
- Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"☆49Updated 6 months ago
- [NeurIPS 2025] Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models☆51Updated 2 months ago
- ☆32Updated 6 months ago
- ☆64Updated 8 months ago
- Code for "Reasoning to Learn from Latent Thoughts"☆122Updated 8 months ago
- Multimodal RewardBench☆55Updated 9 months ago
- [EMNLP 2025 Main] AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time☆86Updated 5 months ago
- ☆22Updated 6 months ago
- This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"☆72Updated 7 months ago
- [NeurIPS 2025 Spotlight] Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning☆135Updated 2 months ago
- A curated list of resources on Reinforcement Learning with Verifiable Rewards (RLVR) and the reasoning capability boundary of Large Langu…☆81Updated last month
- ☆46Updated 11 months ago
- [NAACL 2025 Oral] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models☆50Updated 7 months ago
- ☆69Updated 5 months ago
- Natural Language Reinforcement Learning☆100Updated 4 months ago
- Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"☆26Updated last month
- [NeurIPS 2025] What Makes a Reward Model a Good Teacher? An Optimization Perspective☆40Updated 2 months ago
- SIFT: Grounding LLM Reasoning in Contexts via Stickers☆57Updated 8 months ago