TheAgentArk / ToucanLinks
Official repo of Toucan: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments
☆223Updated last month
Alternatives and similar repositories for Toucan
Users that are interested in Toucan are comparing it to the libraries listed below
Sorting:
- MCPMark is a comprehensive, stress-testing MCP benchmark designed to evaluate model and agent capabilities in real-world MCP use.☆378Updated last week
- R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning☆71Updated 8 months ago
- ☆229Updated 2 weeks ago
- ☆169Updated 2 weeks ago
- The official repo of "WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents"☆100Updated 4 months ago
- ☆56Updated 4 months ago
- Deep Research☆303Updated 5 months ago
- Omni Model Benchmark with high quality and diversity, which reveals the Compositional Law. We’re now focused on Chinese scenarios — and a…☆74Updated 3 weeks ago
- [FSE'2026] SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks☆144Updated last week
- Revisiting Mid-training in the Era of Reinforcement Learning Scaling☆182Updated 6 months ago
- [NeurIPS'25 D&B] Mind2Web-2 Benchmark: Evaluating Agentic Search with Agent-as-a-Judge☆98Updated last month
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆260Updated 9 months ago
- MiroRL is an MCP-first reinforcement learning framework for deep research agent.☆226Updated 5 months ago
- Extrapolating RLVR to General Domains without Verifiers☆196Updated 5 months ago
- ☆219Updated 8 months ago
- ☆229Updated last month
- Official resources of "The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reaso…☆16Updated 7 months ago
- SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis☆118Updated 8 months ago
- Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework☆205Updated 2 weeks ago
- [NeurIPS 2024 D&B Track] GTA: A Benchmark for General Tool Agents☆133Updated 10 months ago
- RM-R1: Unleashing the Reasoning Potential of Reward Models☆158Updated 7 months ago
- A repo for open research on building large reasoning models☆133Updated last week
- [NeurIPS 2025] The official repo of SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond☆191Updated 6 months ago
- ☆72Updated 7 months ago
- A comprehensive benchmark for evaluating deep research agents on academic survey tasks☆49Updated 5 months ago
- SWE-Swiss: A Multi-Task Fine-Tuning and RL Recipe for High-Performance Issue Resolution☆104Updated 4 months ago
- Scaling Long-Horizon LLM Agent via Context-Folding☆106Updated last week
- MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models☆58Updated 6 months ago
- ☆108Updated last month
- MemGen: Weaving Generative Latent Memory for Self-Evolving Agents☆298Updated this week