HazyResearch / wonderbread
WONDERBREAD benchmark + dataset for BPM tasks
☆24Updated 4 months ago
Alternatives and similar repositories for wonderbread:
Users that are interested in wonderbread are comparing it to the libraries listed below
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Updated last year
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆129Updated 3 months ago
- Natural Language Reinforcement Learning☆77Updated 2 months ago
- official implementation of paper "Process Reward Model with Q-value Rankings"☆49Updated last month
- [ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control☆54Updated 2 months ago
- ☆20Updated 9 months ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆46Updated last year
- ☆39Updated 7 months ago
- Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆76Updated last week
- "Improving Mathematical Reasoning with Process Supervision" by OPENAI☆107Updated last week
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆73Updated 2 months ago
- ☆52Updated last week
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆127Updated 4 months ago
- AdaPlanner: Language Models for Decision Making via Adaptive Planning from Feedback☆103Updated last year
- Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)☆35Updated 2 months ago
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆84Updated 4 months ago
- NeurIPS 2024 tutorial on LLM Inference☆39Updated 3 months ago
- [NeurIPS 2024] Agent Planning with World Knowledge Model☆116Updated 2 months ago
- ☆101Updated last month
- ☆26Updated last month
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy☆53Updated 2 months ago
- ☆95Updated 8 months ago
- PreAct: Prediction Enhances Agent's Planning Ability (Coling2025)☆26Updated 3 months ago
- Framework and toolkits for building and evaluating collaborative agents that can work together with humans.☆68Updated 3 weeks ago
- Implementation of the Quiet-STAR paper (https://arxiv.org/pdf/2403.09629.pdf)☆53Updated 7 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated last year