GuoqingWang1 / IGPOLinks
☆18Updated 2 weeks ago
Alternatives and similar repositories for IGPO
Users that are interested in IGPO are comparing it to the libraries listed below
Sorting:
- [ACL 2025 (Findings)] DEMO: Reframing Dialogue Interaction with Fine-grained Element Modeling☆20Updated 10 months ago
- ☆19Updated 7 months ago
- Official implementation of Self-Taught Agentic Long Context Understanding (ACL 2025).☆10Updated last month
- ☆14Updated 9 months ago
- [ICLR 2025] Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization☆12Updated 9 months ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆16Updated 10 months ago
- ☆19Updated 3 months ago
- The code for "MoPE: Mixture of Prefix Experts for Zero-Shot Dialogue State Tracking"☆18Updated 9 months ago
- ☆17Updated last year
- ☆16Updated last year
- ☆44Updated 5 months ago
- MUA-RL: MULTI-TURN USER-INTERACTING AGENT REINFORCEMENT LEARNING FOR AGENTIC TOOL USE☆38Updated last month
- A Recipe for Building LLM Reasoners to Solve Complex Instructions☆27Updated 3 weeks ago
- Sotopia-RL: Reward Design for Social Intelligence☆43Updated 2 months ago
- ☆15Updated last year
- ☆23Updated last year
- Suri: Multi-constraint instruction following for long-form text generation (EMNLP’24)☆26Updated 3 weeks ago
- A comprehensive benchmark for evaluating deep research agents on academic survey tasks☆32Updated last month
- [EMNLP 2025] Verification Engineering for RL in Instruction Following☆40Updated 3 weeks ago
- Official repository for Trustworthy Alignment of Retrieval-Augmented Large Language Models via Reinforcement Learning☆11Updated last year
- ☆25Updated 6 months ago
- ☆44Updated 3 weeks ago
- The source code for running LLMs on the AAAR-1.0 benchmark.☆17Updated 6 months ago
- ☆38Updated 2 months ago
- RuleRAG: Rule Meets Retrieval-Augmented Generation for Question Answering☆27Updated 3 weeks ago
- Instruct Once, Chat Consistently in Multiple Rounds: An Efficient Tuning Framework for Dialogue (ACL 2024)☆24Updated 2 weeks ago
- [arxiv: 2505.02156] Adaptive Thinking via Mode Policy Optimization for Social Language Agents☆46Updated 4 months ago
- AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories☆37Updated 2 months ago
- ☆45Updated last month
- [ACL 2025] Are Your LLMs Capable of Stable Reasoning?☆31Updated 2 months ago