qiancheng0 / EscapeBenchLinks
This is the repository for paper EscapeBench: Pushing Language Models to Think Outside the Box
☆14Updated 7 months ago
Alternatives and similar repositories for EscapeBench
Users that are interested in EscapeBench are comparing it to the libraries listed below
Sorting:
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆146Updated 8 months ago
- ☆31Updated 8 months ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆138Updated 7 months ago
- ☆19Updated 4 months ago
- Natural Language Reinforcement Learning☆92Updated 7 months ago
- ☆114Updated 5 months ago
- WONDERBREAD benchmark + dataset for BPM tasks☆26Updated 9 months ago
- ☆38Updated 5 months ago
- ☆98Updated last year
- official implementation of paper "Process Reward Model with Q-value Rankings"☆60Updated 5 months ago
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆182Updated 3 months ago
- Sotopia-π: Interactive Learning of Socially Intelligent Language Agents (ACL 2024)☆69Updated last year
- [ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control☆59Updated 6 months ago
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆82Updated last month
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples☆101Updated last month
- Reasoning with Language Model is Planning with World Model☆168Updated last year
- Reinforced Multi-LLM Agents training☆30Updated last month
- ☆61Updated 4 months ago
- AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning☆38Updated last month
- AdaPlanner: Language Models for Decision Making via Adaptive Planning from Feedback☆110Updated 3 months ago
- ☁️ KUMO: Generative Evaluation of Complex Reasoning in Large Language Models☆19Updated last month
- ☆55Updated 3 weeks ago
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆49Updated 8 months ago
- ☆50Updated last month
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)☆60Updated 9 months ago
- SmartPlay is a benchmark for Large Language Models (LLMs). Uses a variety of games to test various important LLM capabilities as agents. …☆140Updated last year
- Resources for the Enigmata Project.☆53Updated last month
- ☆41Updated 8 months ago
- ☆107Updated 3 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆123Updated 10 months ago