[ICLR 2025] Benchmarking Agentic Workflow Generation
☆144Feb 19, 2025Updated last year
Alternatives and similar repositories for WorfBench
Users that are interested in WorfBench are comparing it to the libraries listed below
Sorting:
- Official code repository for the paper "ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind"☆22Sep 25, 2025Updated 5 months ago
- M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning☆46Jul 17, 2025Updated 7 months ago
- Plancraft is a minecraft environment and agent suite to test planning capabilities in LLMs☆26Nov 7, 2025Updated 3 months ago
- 🔥🔥🔥 ICLR 2025 Oral. Automating Agentic Workflow Generation.☆436Dec 25, 2025Updated 2 months ago
- ☆28Nov 10, 2025Updated 3 months ago
- An open platform for enhancing the capability of LLMs in workflow orchestration.☆184Mar 11, 2025Updated 11 months ago
- An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]☆396May 20, 2024Updated last year
- ☆84Sep 11, 2024Updated last year
- ☆39Aug 6, 2025Updated 6 months ago
- [NeurIPS 2024] Agent Planning with World Knowledge Model☆164Dec 17, 2024Updated last year
- ☆33Jul 15, 2025Updated 7 months ago
- TraceWeaver is a research prototype for transparently tracing requests through a microservice without application instrumentation.☆23Sep 2, 2024Updated last year
- MPO: Boosting LLM Agents with Meta Plan Optimization (EMNLP 2025 Findings)☆75Aug 20, 2025Updated 6 months ago
- Time-R1: Framework and resources for endowing LLMs with comprehensive temporal reasoning (understanding, prediction, creative generation)…☆64Jun 11, 2025Updated 8 months ago
- AWM: Agent Workflow Memory☆403Dec 22, 2025Updated 2 months ago
- Paper: “MEMRL: SELF-EVOLVING AGENTS VIA RUNTIME REINFORCEMENT LEARNING ON EPISODIC MEMORY” Open-Source Code☆36Updated this week
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆13Jun 22, 2025Updated 8 months ago
- ☆12Mar 5, 2025Updated 11 months ago
- Implementation of a Hierarchical Mamba as described in the paper: "Hierarchical State Space Models for Continuous Sequence-to-Sequence Mo…☆15Nov 11, 2024Updated last year
- A holistic framework for advancing LLMs as data science agents☆33Feb 3, 2026Updated last month
- ☆11Jan 3, 2024Updated 2 years ago
- Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"☆11Oct 11, 2024Updated last year
- [ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery☆124Aug 26, 2025Updated 6 months ago
- ☆31Jun 12, 2024Updated last year
- Graphlit Platform☆30Feb 20, 2024Updated 2 years ago
- ☆46Jun 11, 2025Updated 8 months ago
- [ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step☆304Apr 3, 2024Updated last year
- This is the repository for the Tool Learning survey.☆481Aug 9, 2025Updated 6 months ago
- Aligning Agentic World Models via Knowledgeable Experience Learning☆31Jan 25, 2026Updated last month
- ☆34Jan 25, 2026Updated last month
- [NAACL'25 🏆 SAC Award] Official code for "Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert…☆15Feb 4, 2025Updated last year
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts☆17Apr 2, 2025Updated 11 months ago
- An interactive thinking and deep reasoning model. It provides a cognitive reasoning paradigm for complex multi-hop problems.☆79Nov 14, 2025Updated 3 months ago
- [ICML'25] "Rethinking Addressing in Language Models via Contextualized Equivariant Positional Encoding" by Jiajun Zhu, Peihao Wang, Ruisi…☆14Jun 6, 2025Updated 8 months ago
- MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models☆58Jul 24, 2025Updated 7 months ago
- This is the official implementation of TAGCOS: Task-agnostic Gradient Clustered Coreset Selection for Instruction Tuning Data☆13Jul 21, 2024Updated last year
- ☆16Jun 10, 2025Updated 8 months ago
- A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)☆3,187Feb 8, 2026Updated 3 weeks ago
- Source code of paper: Process vs. Outcome Reward: Which is Better for Agentic RAG Reinforcement Learning☆45Jun 24, 2025Updated 8 months ago