open-compass / GTALinks
[NeurIPS 2024 D&B Track] GTA: A Benchmark for General Tool Agents
β118Updated 4 months ago
Alternatives and similar repositories for GTA
Users that are interested in GTA are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] Benchmarking Agentic Workflow Generationβ118Updated 6 months ago
- π§Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learningβ236Updated last week
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".β81Updated 2 months ago
- Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replayβ109Updated 2 months ago
- β144Updated 2 months ago
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuningβ148Updated 8 months ago
- β67Updated 2 months ago
- β271Updated 3 months ago
- xVerify: Efficient Answer Verifier for Reasoning Model Evaluationsβ128Updated 4 months ago
- β200Updated 2 months ago
- RM-R1: Unleashing the Reasoning Potential of Reward Modelsβ122Updated last month
- [ACL 2025] Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesisβ155Updated last month
- β317Updated 2 months ago
- β206Updated 6 months ago
- [ACL'25] We propose a novel fine-tuning method, Separate Memory and Reasoning, which combines prompt tuning with LoRA.β73Updated last month
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasksβ234Updated 3 months ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learningβ248Updated 3 months ago
- MPO: Boosting LLM Agents with Meta Plan Optimization (EMNLP 2025 Findings)β66Updated this week
- Pre-trained, Scalable, High-performance Reward Models via Policy Discriminative Learning.β147Updated last month
- β325Updated 3 weeks ago
- [ACL2024] Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenariosβ60Updated 2 weeks ago
- A version of verl to support tool useβ333Updated this week
- End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoningβ172Updated 2 weeks ago
- AutoCoA (Automatic generation of Chain-of-Action) is an agent model framework that enhances the multi-turn tool usage capability of reasoβ¦β124Updated 5 months ago
- β103Updated 8 months ago
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correctionβ77Updated 5 months ago
- Repo of paper "Free Process Rewards without Process Labels"β161Updated 5 months ago
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoningβ188Updated 5 months ago
- Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL.β36Updated this week
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.β127Updated 5 months ago