[ICLR 2025] Benchmarking Agentic Workflow Generation
β150Feb 19, 2025Updated last year
Alternatives and similar repositories for WorfBench
Users that are interested in WorfBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- π₯π₯π₯ ICLR 2025 Oral. Automating Agentic Workflow Generation.β506Dec 25, 2025Updated 5 months ago
- Plancraft is a minecraft environment and agent suite to test planning capabilities in LLMsβ27Nov 7, 2025Updated 6 months ago
- SkillOrchestra: Learning to Route Agents via Skill Transferβ63Mar 25, 2026Updated 2 months ago
- β28Nov 10, 2025Updated 6 months ago
- An open platform for enhancing the capability of LLMs in workflow orchestration.β189Mar 11, 2025Updated last year
- Proton VPN Special Offer - Get 70% off β’ AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]β416May 20, 2024Updated 2 years ago
- [NeurIPS 2024] Agent Planning with World Knowledge Modelβ168Dec 17, 2024Updated last year
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)β163Oct 30, 2024Updated last year
- [Findings of EMNLP22] From Mimicking to Integrating: Knowledge Integration for Pre-Trained Language Modelsβ19Mar 16, 2023Updated 3 years ago
- M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoningβ47Jul 17, 2025Updated 10 months ago
- This is the repository for the Tool Learning survey.β483Aug 9, 2025Updated 9 months ago
- MATCH-TUNINGβ15Aug 6, 2022Updated 3 years ago
- [ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discoveryβ137Apr 29, 2026Updated 3 weeks ago
- Aligning Agentic World Models via Knowledgeable Experience Learningβ35May 15, 2026Updated last week
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- This is for EMNLP 2024 Paper: AppBench: Planning of Multiple APIs from Various APPs for Complex User Instructionβ15Nov 4, 2024Updated last year
- Official repository of Graph RAG-Tool Fusion and ToolLinkOS dataset.β23Feb 13, 2025Updated last year
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contextsβ17Apr 2, 2025Updated last year
- A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)β3,444Feb 8, 2026Updated 3 months ago
- [R]einforcement [L]earning from [M]odel-rewarded [T]hinking - code for the paper "Language Models That Think, Chat Better"β129Oct 27, 2025Updated 6 months ago
- MPO: Boosting LLM Agents with Meta Plan Optimization (EMNLP 2025 Findings)β81Aug 20, 2025Updated 9 months ago
- Must-read Papers on LLM Agents.β3,021Apr 17, 2026Updated last month
- β90Sep 11, 2024Updated last year
- A holistic framework for advancing LLMs as data science agentsβ49Updated this week
- GPUs on demand by Runpod - Special Offer Available β’ AdRun AI, ML, and HPC workloads on powerful cloud GPUsβwithout limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- This is the official implementation of TAGCOS: Task-agnostic Gradient Clustered Coreset Selection for Instruction Tuning Dataβ13Jul 21, 2024Updated last year
- MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Modelsβ60Jul 24, 2025Updated 10 months ago
- β20Mar 5, 2024Updated 2 years ago
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)β66Oct 18, 2024Updated last year
- Towards Large Multimodal Models as Visual Foundation Agentsβ265Apr 24, 2025Updated last year
- Setup scripts for the WebArena benchmarkβ22Jun 19, 2025Updated 11 months ago
- Recursive Abstractive Processing for Tree-Organized Retrievalβ10May 30, 2024Updated last year
- An JS web client for connecting to Pipecat bots with voice and visionβ16Jan 16, 2025Updated last year
- AWM: Agent Workflow Memoryβ430Dec 22, 2025Updated 5 months ago
- Managed Database hosting by DigitalOcean β’ AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- MLCD-Seg is a zero-shot segmentation model from DeepGlint.β18Jul 4, 2025Updated 10 months ago
- β16Jul 23, 2024Updated last year
- β42Apr 8, 2026Updated last month
- [WSDM 2026] LookAhead Tuning: Safer Language Models via Partial Answer Previewsβ17Dec 14, 2025Updated 5 months ago
- [NeurIPS 2022] πWebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agentsβ540Sep 6, 2024Updated last year
- Time-R1: Framework and resources for endowing LLMs with comprehensive temporal reasoning (understanding, prediction, creative generation)β¦β72Jun 11, 2025Updated 11 months ago
- Code for NAACL2022 Long Paper "An Enhanced Span-based Decomposition Method for Few-Shot Sequence Labeling"β28Nov 9, 2022Updated 3 years ago