ZhaolinGao / A-POLinks
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
☆34Updated 8 months ago
Alternatives and similar repositories for A-PO
Users that are interested in A-PO are comparing it to the libraries listed below
Sorting:
- grpo to train long form QA and instructions with long-form reward model☆16Updated 6 months ago
- ☆73Updated 7 months ago
- ☆132Updated 2 months ago
- Dream-VL and Dream-VLA, a diffusion VLM and a diffusion VLA.☆98Updated 2 weeks ago
- ☆49Updated 9 months ago
- 🔍 Awesome Agentic Search is a curated list of papers, tools, and resources on agentic search—where AI agents plan, search, and reason to…☆52Updated 5 months ago
- Natural Language Reinforcement Learning☆101Updated 6 months ago
- VeriWeb: Verifiable Long-Chain Web Benchmark for Agentic Information-Seeking☆84Updated last week
- ☆51Updated 8 months ago
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆51Updated 6 months ago
- MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research☆22Updated 4 months ago
- ☆21Updated 8 months ago
- [ACL 2025] A Neural-Symbolic Self-Training Framework☆117Updated 8 months ago
- Reinforced Multi-LLM Agents training☆69Updated 2 weeks ago
- This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"☆73Updated 9 months ago
- Discriminative Constrained Optimization for Reinforcing Large Reasoning Models☆50Updated 2 months ago
- ☆20Updated 9 months ago
- ☆51Updated last year
- AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning☆53Updated 7 months ago
- Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts☆25Updated last year
- Official code for the paper: WALL-E: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents☆55Updated last month
- [EMNLP 2025] LightThinker: Thinking Step-by-Step Compression☆131Updated 9 months ago
- [ICML 2025] Closed-Loop Long-Horizon Robotic Planning via Equilibrium Sequence Modeling☆11Updated 8 months ago
- Aligning Agentic World Models via Knowledgeable Experience Learning☆27Updated last week
- We introduce BabyVision, a benchmark revealing the infancy of AI vision.☆162Updated 2 weeks ago
- instruction-following benchmark for large reasoning models☆44Updated 5 months ago
- The code for creating the iGSM datasets in papers "Physics of Language Models Part 2.1, Grade-School Math and the Hidden Reasoning Proces…☆84Updated last year
- ☆54Updated 5 months ago
- [ACL 2025] A Generalizable and Purely Unsupervised Self-Training Framework☆71Updated 8 months ago
- Reinforcing General Reasoning without Verifiers☆93Updated 7 months ago