QwenLM / WorldPMLinks
☆86Updated last month
Alternatives and similar repositories for WorldPM
Users that are interested in WorldPM are comparing it to the libraries listed below
Sorting:
- Efficient Agent Training for Computer Use☆106Updated 3 weeks ago
- DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents☆135Updated last week
- General Reasoner: Advancing LLM Reasoning Across All Domains☆142Updated 2 weeks ago
- IKEA: Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent☆58Updated last month
- The source code and dataset mentioned in the paper Seal-Tools: Self-Instruct Tool Learning Dataset for Agent Tuning and Detailed Benchmar…☆50Updated 7 months ago
- ☆42Updated 8 months ago
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆159Updated 3 weeks ago
- ☆47Updated 2 weeks ago
- An Open Math Pre-trainng Dataset with 370B Tokens.☆89Updated 2 months ago
- RL Scaling and Test-Time Scaling (ICML'25)☆106Updated 5 months ago
- A Comprehensive Survey on Long Context Language Modeling☆152Updated 3 weeks ago
- AutoCoA (Automatic generation of Chain-of-Action) is an agent model framework that enhances the multi-turn tool usage capability of reaso…☆116Updated 3 months ago
- SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis☆66Updated 3 weeks ago
- ☆103Updated 6 months ago
- ☆77Updated 2 months ago
- ☆95Updated 6 months ago
- ☆273Updated 3 weeks ago
- ☆55Updated last week
- [ICML 2025] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale☆251Updated 3 weeks ago
- [ICML 2025] Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search☆99Updated 3 weeks ago
- Reformatted Alignment☆113Updated 9 months ago
- Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper☆137Updated 11 months ago
- The official repo of SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond☆147Updated 3 weeks ago
- ☆94Updated 6 months ago
- We aim to provide the best references to search, select, and synthesize high-quality and large-quantity data for post-training your LLMs.☆57Updated 8 months ago
- MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models☆44Updated 4 months ago
- ☆38Updated 2 months ago
- [ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆93Updated 2 weeks ago
- FuseAI Project☆87Updated 5 months ago
- Hammer: Robust Function-Calling for On-Device Language Models via Function Masking☆85Updated 2 weeks ago