HazyResearch / wonderbread
WONDERBREAD benchmark + dataset for BPM tasks
☆24Updated 6 months ago
Alternatives and similar repositories for wonderbread:
Users that are interested in wonderbread are comparing it to the libraries listed below
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆134Updated 5 months ago
- ☆107Updated 3 months ago
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆80Updated 3 weeks ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Updated last year
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆47Updated last year
- "Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents"☆69Updated 2 weeks ago
- Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆84Updated last month
- ☆81Updated this week
- Natural Language Reinforcement Learning☆87Updated 4 months ago
- [ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control☆56Updated 3 months ago
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆48Updated 5 months ago
- Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)☆36Updated 3 months ago
- official implementation of paper "Process Reward Model with Q-value Rankings"☆56Updated 2 months ago
- Evaluate the Quality of Critique☆34Updated 10 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated last year
- ☆15Updated last month
- ☆46Updated 2 months ago
- Code for "Reasoning to Learn from Latent Thoughts"☆91Updated 3 weeks ago
- Code for ICLR 2024 paper "CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets"☆53Updated 10 months ago
- [NeurIPS 2024] Agent Planning with World Knowledge Model☆126Updated 4 months ago
- [Preprint] A Generalizable and Purely Unsupervised Self-Training Framework☆50Updated last week
- An Illusion of Progress? Assessing the Current State of Web Agents☆38Updated this week
- ☆43Updated 8 months ago
- ☆22Updated 10 months ago
- ☆57Updated last month
- ☆55Updated 2 weeks ago
- Code for paper "Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System"☆56Updated 5 months ago
- ☆19Updated 11 months ago
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆79Updated 2 months ago
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆53Updated last year