HazyResearch / wonderbread
WONDERBREAD benchmark + dataset for BPM tasks
☆23Updated 3 months ago
Alternatives and similar repositories for wonderbread:
Users that are interested in wonderbread are comparing it to the libraries listed below
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆107Updated last month
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆110Updated 2 months ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆53Updated 10 months ago
- ☆83Updated last week
- AdaPlanner: Language Models for Decision Making via Adaptive Planning from Feedback☆100Updated last year
- Natural Language Reinforcement Learning☆68Updated last month
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆66Updated 2 weeks ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆46Updated last year
- [NeurIPS 2024] Agent Planning with World Knowledge Model☆102Updated last month
- [ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control☆54Updated last week
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated 11 months ago
- PreAct: Prediction Enhances Agent's Planning Ability (Coling2025)☆25Updated last month
- "Improving Mathematical Reasoning with Process Supervision" by OPENAI☆101Updated this week
- ☆48Updated last month
- [ACL 2024] The project of Symbol-LLM☆46Updated 6 months ago
- 🌾 OAT: Online AlignmenT for LLMs☆81Updated 3 weeks ago
- ☆21Updated 2 months ago
- Code for paper "Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System"☆47Updated 2 months ago
- Evaluate the Quality of Critique☆35Updated 7 months ago
- Benchmarking Agentic Workflow Generation☆36Updated last month
- Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"☆27Updated 7 months ago
- Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)☆35Updated 3 weeks ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆88Updated 3 months ago
- ☆20Updated 7 months ago
- Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering☆54Updated last month
- ☆43Updated 3 weeks ago
- [EMNLP Findings 2024 & ACL 2024 NLRSE Oral] Enhancing Mathematical Reasoning in Language Models with Fine-grained Rewards☆49Updated 8 months ago
- official implementation of paper "Process Reward Model with Q-value Rankings"☆28Updated 2 months ago
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"☆111Updated 2 months ago
- ☆93Updated 6 months ago