This is the repository for paper EscapeBench: Pushing Language Models to Think Outside the Box
☆18Dec 19, 2024Updated last year
Alternatives and similar repositories for EscapeBench
Users that are interested in EscapeBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆22Sep 7, 2025Updated 9 months ago
- What if you need more exercises?☆37Jul 16, 2024Updated last year
- Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 202…☆40May 26, 2025Updated last year
- Winner of Cloth Competition: ICRA 2023, ICRA 2024 - Center Direction Network for Grasping Point Localization on Cloths - IEEE Robotic…☆23May 2, 2026Updated last month
- Code for Evaluating Explanations for Reading Comprehension with Realistic Counterfactuals.☆17Apr 25, 2021Updated 5 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- ☆16Jul 29, 2025Updated 10 months ago
- SVIP: Towards Verifiable Inference of Open-Source Large Language Models☆15Jun 3, 2025Updated last year
- [ICLR 2024] DMBP: Diffusion Model-Based Predictor for Robust Offline Reinforcement Learning against State Observations Perturbations.☆18May 24, 2024Updated 2 years ago
- ☆25Nov 30, 2020Updated 5 years ago
- Sotopia-π: Interactive Learning of Socially Intelligent Language Agents (ACL 2024)☆84May 7, 2024Updated 2 years ago
- ☆15Apr 19, 2021Updated 5 years ago
- ☆11Apr 29, 2019Updated 7 years ago
- [ICLR2025] Are Large Vision Language Models Good Game Players?☆13Mar 3, 2025Updated last year
- ☆18Jan 3, 2022Updated 4 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Collections of Undergraduate Course Projects☆22Updated this week
- Code for NeurIPS 2022 paper "Robust offline Reinforcement Learning via Conservative Smoothing"☆24Feb 15, 2023Updated 3 years ago
- [EMNLP 2024 Findings] Benchmarking Language Model Agents for Data-Driven Science☆35Oct 25, 2024Updated last year
- [ICLR 2024 Spotlight] Code for ICLR 2024 paper "Towards Robust Offline Reinforcement Learning under Diverse Data Corruption"☆22Nov 25, 2024Updated last year
- Modular-HER is revised from OpenAI baselines and supports many improvements for Hindsight Experience Replay as modules.☆17Jun 23, 2021Updated 4 years ago
- ☆27May 28, 2025Updated last year
- A LLM-powered agent for NetHack☆23Nov 4, 2024Updated last year
- The official repo for the code and data of paper SMART☆40Feb 20, 2025Updated last year
- TensorFlow implementation of the paper `Adversarial Multi-task Learning for Text Classification`☆11Apr 11, 2018Updated 8 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆12May 22, 2026Updated 3 weeks ago
- ☆12Nov 9, 2018Updated 7 years ago
- [CVPR 2025] GUI-Xplore: Empowering Generalizable GUI Agents with One Exploration☆20Mar 21, 2025Updated last year
- Data for SubTask A☆17Dec 13, 2021Updated 4 years ago
- Escape room adventure game developed in Unity 3D☆12Apr 28, 2019Updated 7 years ago
- Open-source repository for the OOPSLA'24 paper "CYCLE: Learning to Self-Refine Code Generation"☆10Mar 8, 2024Updated 2 years ago
- ☆20Oct 25, 2022Updated 3 years ago
- ☆15Jul 5, 2024Updated last year
- Framework and toolkits for building and evaluating collaborative agents that can work together with humans.☆138Apr 30, 2026Updated last month
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- The official code of TACL 2022, "Break, Perturb, Build: Automatic Perturbation of Reasoning Paths Through Question Decomposition".☆12Oct 18, 2021Updated 4 years ago
- [ACL '26 Findings] V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in MLLMs☆27Apr 28, 2026Updated last month
- ☆44Oct 31, 2025Updated 7 months ago
- GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents☆191Jun 5, 2026Updated last week
- ☆14Jun 10, 2019Updated 7 years ago
- ☆18Sep 15, 2025Updated 9 months ago
- Code for ACL 2018 paper "Discourse Marker Augmented Network with Reinforcement Learning for Natural Language Inference".☆17Aug 5, 2018Updated 7 years ago