GuoqingWang1 / IGPOLinks
☆23Updated 2 weeks ago
Alternatives and similar repositories for IGPO
Users that are interested in IGPO are comparing it to the libraries listed below
Sorting:
- [ACL 2025 (Findings)] DEMO: Reframing Dialogue Interaction with Fine-grained Element Modeling☆20Updated 11 months ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆16Updated 11 months ago
- ☆19Updated 8 months ago
- A comprehensive benchmark for evaluating deep research agents on academic survey tasks☆37Updated 2 months ago
- ☆26Updated 7 months ago
- [EMNLP 2025] Verification Engineering for RL in Instruction Following☆41Updated last month
- [ICLR 2025] Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization☆12Updated 9 months ago
- [ICML 2025] Official resources of "KBQA-o1: Agentic Knowledge Base Question Answering with Monte Carlo Tree Search".☆33Updated 3 months ago
- Suri: Multi-constraint instruction following for long-form text generation (EMNLP’24)☆26Updated last month
- ☆14Updated 9 months ago
- Official repository for Trustworthy Alignment of Retrieval-Augmented Large Language Models via Reinforcement Learning☆12Updated last year
- ☆23Updated last year
- ☆45Updated last month
- Sotopia-RL: Reward Design for Social Intelligence☆43Updated 3 months ago
- ☆22Updated 4 months ago
- A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward model…☆58Updated 5 months ago
- [arxiv: 2505.02156] Adaptive Thinking via Mode Policy Optimization for Social Language Agents☆46Updated 4 months ago
- The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism☆30Updated last year
- ☆38Updated 3 months ago
- A Recipe for Building LLM Reasoners to Solve Complex Instructions☆28Updated last month
- Self-Knowledge Guided Retrieval Augmentation for Large Language Models (EMNLP Findings 2023)☆28Updated last year
- a survey on deep research☆37Updated 2 months ago
- Official implementation of Self-Taught Agentic Long Context Understanding (ACL 2025).☆11Updated 2 months ago
- ☆16Updated last year
- Source code of paper: Process vs. Outcome Reward: Which is Better for Agentic RAG Reinforcement Learning☆40Updated 4 months ago
- [EMNLP 2025] WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning☆60Updated 2 weeks ago
- Code for the 2025 ACL publication "Fine-Tuning on Diverse Reasoning Chains Drives Within-Inference CoT Refinement in LLMs"☆33Updated 4 months ago
- [ICML 2025] Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment (https://arxiv.org/abs/2410.02197)☆30Updated 2 months ago
- ☆25Updated 6 months ago
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆58Updated last year