This is the repository for paper EscapeBench: Pushing Language Models to Think Outside the Box
☆18Dec 19, 2024Updated last year
Alternatives and similar repositories for EscapeBench
Users that are interested in EscapeBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆21Sep 7, 2025Updated 6 months ago
- What if you need more exercises?☆33Jul 16, 2024Updated last year
- This is the repository for paper "CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models"☆30Oct 8, 2023Updated 2 years ago
- Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 202…☆40May 26, 2025Updated 9 months ago
- Winner of Cloth Competition: ICRA 2023, ICRA 2024 - Center Direction Network for Grasping Point Localization on Cloths - IEEE Robotic…☆22Feb 2, 2026Updated last month
- Code for Evaluating Explanations for Reading Comprehension with Realistic Counterfactuals.☆18Apr 25, 2021Updated 4 years ago
- ☆16Jul 29, 2025Updated 7 months ago
- SVIP: Towards Verifiable Inference of Open-Source Large Language Models☆14Jun 3, 2025Updated 9 months ago
- code of IJCAI submission "Soft Hindsight Experience Replay"☆13Mar 23, 2020Updated 6 years ago
- Sotopia-π: Interactive Learning of Socially Intelligent Language Agents (ACL 2024)☆81May 7, 2024Updated last year
- ☆15Apr 19, 2021Updated 4 years ago
- ☆15Jan 18, 2026Updated 2 months ago
- ☆12Oct 10, 2024Updated last year
- [ICLR2025] Are Large Vision Language Models Good Game Players?☆12Mar 3, 2025Updated last year
- ☆22Oct 31, 2025Updated 4 months ago
- ☆18Jan 3, 2022Updated 4 years ago
- [ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents☆304Mar 11, 2026Updated last week
- ☆29Oct 18, 2022Updated 3 years ago
- Code for NeurIPS 2022 paper "Robust offline Reinforcement Learning via Conservative Smoothing"☆24Feb 15, 2023Updated 3 years ago
- [ICLR 2024 Spotlight] Code for ICLR 2024 paper "Towards Robust Offline Reinforcement Learning under Diverse Data Corruption"☆20Nov 25, 2024Updated last year
- Model for processing text sequences with coreference annotations☆14Nov 29, 2018Updated 7 years ago
- Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images☆18Jun 4, 2025Updated 9 months ago
- A LLM-powered agent for NetHack☆21Nov 4, 2024Updated last year
- ☆25May 28, 2025Updated 9 months ago
- The official repo for the code and data of paper SMART☆40Feb 20, 2025Updated last year
- ☆12Updated this week
- ☆18Mar 12, 2025Updated last year
- Building a quick conversation-based search demo with langchain.☆10Apr 2, 2024Updated last year
- ☆12Nov 9, 2018Updated 7 years ago
- Evaluation Pipeline for medical tasks.☆12Feb 13, 2026Updated last month
- Data for SubTask A☆17Dec 13, 2021Updated 4 years ago
- Escape room adventure game developed in Unity 3D☆12Apr 28, 2019Updated 6 years ago
- Framework and toolkits for building and evaluating collaborative agents that can work together with humans.☆124Dec 4, 2025Updated 3 months ago
- V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in MLLMs☆24Jul 31, 2025Updated 7 months ago
- ☆14Jul 5, 2024Updated last year
- [NeurIPS 2024] A task generation and model evaluation system for multimodal language models.☆73Nov 27, 2024Updated last year
- The official code of TACL 2022, "Break, Perturb, Build: Automatic Perturbation of Reasoning Paths Through Question Decomposition".☆12Oct 18, 2021Updated 4 years ago
- ☆14Jun 10, 2019Updated 6 years ago
- ☆13Sep 26, 2024Updated last year