Alibaba-Quark / SSPLinks
Search Self-Play: Pushing the Frontier of Agent Capability without Supervision
β90Updated last month
Alternatives and similar repositories for SSP
Users that are interested in SSP are comparing it to the libraries listed below
Sorting:
- [AAAI 2026] Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".β95Updated 3 months ago
- π§Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learningβ314Updated last month
- Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replayβ148Updated 8 months ago
- Towards a Unified View of Large Language Model Post-Trainingβ200Updated 5 months ago
- Extrapolating RLVR to General Domains without Verifiersβ200Updated 5 months ago
- Official Repository of "Learning to Reason under Off-Policy Guidance"β413Updated 4 months ago
- [ICLR 2026] End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoningβ353Updated 3 weeks ago
- Pre-trained, Scalable, High-performance Reward Models via Policy Discriminative Learning.β164Updated 4 months ago
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.β419Updated 6 months ago
- β179Updated 2 months ago
- β333Updated 8 months ago
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuningβ155Updated last year
- Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Modelsβ45Updated 4 months ago
- xVerify: Efficient Answer Verifier for Reasoning Model Evaluationsβ143Updated 2 months ago
- A comprehensive collection of process reward models.β136Updated 4 months ago
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".β55Updated last year
- π Awesome Agentic Search is a curated list of papers, tools, and resources on agentic searchβwhere AI agents plan, search, and reason toβ¦β53Updated 5 months ago
- MemGen: Weaving Generative Latent Memory for Self-Evolving Agentsβ298Updated last week
- β432Updated 3 months ago
- Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"β154Updated 3 months ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learningβ261Updated 8 months ago
- REverse-Engineered Reasoning for Open-Ended Generationβ91Updated 5 months ago
- R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learningβ71Updated 8 months ago
- β352Updated 6 months ago
- This is the official implementation of the paper "SΒ²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"β74Updated 9 months ago
- [ACL' 25] The official code repository for PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models.β88Updated 11 months ago
- Official code for paper "SPA-RL: Reinforcing LLM Agent via Stepwise Progress Attribution"β62Updated 4 months ago
- Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agentsβ220Updated 9 months ago
- Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Frameworkβ208Updated 3 weeks ago
- Official Repository of "Learning what reinforcement learning can't"β79Updated last month