qiancheng0/EscapeBench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/qiancheng0/EscapeBench)

qiancheng0 / EscapeBench

This is the repository for paper EscapeBench: Pushing Language Models to Think Outside the Box

☆18

Alternatives and similar repositories for EscapeBench

Users that are interested in EscapeBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

qiancheng0 / ModelingAgent
View on GitHub
☆23Sep 7, 2025Updated 10 months ago
OpenWebRL / OpenWebRL
View on GitHub
Code for paper OpenWebRL: Online Multi-Turn Reinforcement Learning for Visual Web Agents
☆37Jul 9, 2026Updated 2 weeks ago
yuhui-zh15 / AutoConverter
View on GitHub
Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 202…
☆40May 26, 2025Updated last year
VITA-Group / Trap-and-Replace-Backdoor-Defense
View on GitHub
[NeurIPS'22] Trap and Replace: Defending Backdoor Attacks by Trapping Them into an Easy-to-Replace Subnetwork. Haotao Wang, Junyuan Hong,…
☆15Nov 27, 2023Updated 2 years ago
xiye17 / EvalQAExpl
View on GitHub
Code for Evaluating Explanations for Reading Comprehension with Realistic Counterfactuals.
☆17Apr 25, 2021Updated 5 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
OpenBMB / Tell_Me_More
View on GitHub
Repo for paper "Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents"
☆65Feb 20, 2024Updated 2 years ago
gangiswag / infogent
View on GitHub
☆24Mar 1, 2025Updated last year
FSLight1996 / SHER
View on GitHub
code of IJCAI submission "Soft Hindsight Experience Replay"
☆13Mar 23, 2020Updated 6 years ago
ZZZhr-1 / Robust_GUI_Grounding
View on GitHub
On the Robustness of GUI Grounding Models Against Image Attacks
☆12Apr 8, 2025Updated last year
kstats / CausalQG
View on GitHub
☆15Apr 19, 2021Updated 5 years ago
OSU-NLP-Group / UGround
View on GitHub
[ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents
☆315Mar 11, 2026Updated 4 months ago
henrycharlesworth / PlanGAN
View on GitHub
☆18Jan 3, 2022Updated 4 years ago
mlfoundations / clip_quality_not_quantity
View on GitHub
☆28Oct 18, 2022Updated 3 years ago
xinke-wang / LVLM-Playground
View on GitHub
[ICLR2025] Are Large Vision Language Models Good Game Players?
☆13Mar 3, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ATR-DBI / Map-EQA
View on GitHub
☆12Oct 10, 2024Updated last year
RLHFlow / GVM
View on GitHub
☆16Jul 29, 2025Updated 11 months ago
Tinaliu0123 / speculative-verdict
View on GitHub
Small Drafts, Big Verdict: Information-Intensive Visual Reasoning via Speculation (ICLR 2026)
☆21Apr 27, 2026Updated 2 months ago
bdhingra / coref-gru
View on GitHub
Model for processing text sequences with coreference annotations
☆14Nov 29, 2018Updated 7 years ago
benediktstroebl / agent-evals
View on GitHub
☆27May 28, 2025Updated last year
HeimingX / TAG
View on GitHub
Official code for Attention-driven GUI Grounding, AAAI2025
☆16Dec 17, 2024Updated last year
zt991211 / CLAMBER
View on GitHub
A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models
☆23Jun 2, 2024Updated 2 years ago
moment-timeseries-foundation-model / TimeSeriesExam
View on GitHub
☆16Mar 12, 2025Updated last year
penzant / nlu_datasets_2018
View on GitHub
☆12Nov 9, 2018Updated 7 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Semeval2019Task9 / Subtask-A
View on GitHub
Data for SubTask A
☆17Dec 13, 2021Updated 4 years ago
QwenLM / ConsisEval
View on GitHub
☆14Jul 5, 2024Updated 2 years ago
Infini-AI-Lab / M2PO
View on GitHub
☆34Oct 8, 2025Updated 9 months ago
mega002 / qdmr-based-question-generation
View on GitHub
The official code of TACL 2022, "Break, Perturb, Build: Automatic Perturbation of Reasoning Paths Through Question Decomposition".
☆12Oct 18, 2021Updated 4 years ago
chijames / KERPLE
View on GitHub
☆20Oct 25, 2022Updated 3 years ago
bnewm0609 / arxivDIGESTables
View on GitHub
☆18Sep 15, 2025Updated 10 months ago
ahnjaewoo / FlashAdventure
View on GitHub
🕵 Code for our EMNLP 2025 Main paper: "FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games"
☆27Apr 26, 2026Updated 3 months ago
spacetools / SpaceTools
View on GitHub
code release
☆38Jun 22, 2026Updated last month
Jometeorie / MultiHopShortcuts
View on GitHub
Reproduction Code for Paper "Investigating Multi-Hop Factual Shortcuts in Knowledge Editing of Large Language Models"
☆14Jun 1, 2024Updated 2 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
YangRui2015 / RORL
View on GitHub
Code for NeurIPS 2022 paper "Robust offline Reinforcement Learning via Conservative Smoothing"
☆24Feb 15, 2023Updated 3 years ago
RZkiller / AffordVLA
View on GitHub
Afford-VLA: Action-Aligned Visual Planning via Internalized Affordance
☆16Jul 5, 2026Updated 2 weeks ago
jasenchn / checkwhy
View on GitHub
☆11Sep 24, 2024Updated last year
MMesgar / neural_coherence_model
View on GitHub
EMNLP-18
☆17Dec 21, 2021Updated 4 years ago
OSU-NLP-Group / Online-Mind2Web
View on GitHub
An Illusion of Progress? Assessing the Current State of Web Agents
☆192Jun 25, 2026Updated last month
jnzs1836 / intent-vizor
View on GitHub
☆16Jul 10, 2024Updated 2 years ago
weixuan-wang123 / ReMaKE
View on GitHub
☆14Sep 1, 2025Updated 10 months ago