zjunlp / WorfBench
[ICLR 2025] Benchmarking Agentic Workflow Generation
☆40Updated 2 months ago
Alternatives and similar repositories for WorfBench:
Users that are interested in WorfBench are comparing it to the libraries listed below
- [NeurIPS 2024] Agent Planning with World Knowledge Model☆104Updated last month
- Reformatted Alignment☆113Updated 4 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆90Updated 3 months ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆114Updated 3 months ago
- ☆98Updated last month
- ☆27Updated last month
- ☆87Updated last week
- The official implementation of "Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks"☆52Updated 9 months ago
- This is the official repository of the paper "OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI"☆91Updated last month
- [ICLR 2025] InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales☆66Updated 2 months ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆111Updated 2 months ago
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆107Updated 8 months ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Updated 11 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated 11 months ago
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆136Updated last month
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆40Updated 2 months ago
- Code for the paper <SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning>☆48Updated last year
- ☆40Updated 3 months ago
- ☆48Updated last month
- ☆120Updated 7 months ago
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆68Updated 3 weeks ago
- Towards Large Multimodal Models as Visual Foundation Agents☆169Updated last month
- ☆108Updated 7 months ago
- A curated list of awesome LLM Inference-Time Self-Improvement (ITSI, pronounced "itsy") papers from our recent survey: A Survey on Large …☆66Updated last month
- Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆58Updated last week
- ☆45Updated 3 months ago
- ☆49Updated 4 months ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆45Updated last month
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆79Updated 3 months ago