MetaAgentX/OpenCaptchaWorld

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/MetaAgentX/OpenCaptchaWorld)

MetaAgentX / OpenCaptchaWorld

[NeurIPS 2025] The first web-based benchmark and platform to evaluate visual reasoning and interaction capabilities of MLLM powered agents through diverse and dynamic CAPTCHA puzzles.

☆82

Alternatives and similar repositories for OpenCaptchaWorld

Users that are interested in OpenCaptchaWorld are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Jiacheng8 / CV-DD
View on GitHub
Dataset Distillation via Committee Voting
☆15Jul 28, 2025Updated 11 months ago
Yaxin9Luo / Gamma-MOD
View on GitHub
[ICLR2025] γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models
☆45Oct 28, 2025Updated 8 months ago
VILA-Lab / M-Attack
View on GitHub
[NeurIPS25 & ICML25 Workshop on Reliable and Responsible Foundation Models] A Simple Baseline Achieving Over 90% Success Rate Against the…
☆100Feb 3, 2026Updated 5 months ago
shaoshitong / EDC
View on GitHub
Elucidated Dataset Condensation (NeurIPS 2024)
☆20Oct 5, 2024Updated last year
TimeBlindness / time-blindness
View on GitHub
[CVPR 2026 🔥] Time Blindness: Why Video-Language Models Can't See What Humans Can?
☆65Jan 28, 2026Updated 5 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
VILA-Lab / DELT
View on GitHub
(CVPR 2025) Official implementation to DELT: A Simple Diversity-driven EarlyLate Training for Dataset Distillation which outperforms SOTA…
☆28Aug 23, 2025Updated 10 months ago
MBZUAI-LLM / web2code
View on GitHub
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
☆103Oct 23, 2024Updated last year
xnancy / russ
View on GitHub
☆16Apr 9, 2021Updated 5 years ago
VILA-Lab / Open-LLM-Leaderboard
View on GitHub
Open-LLM-Leaderboard: Open-Style Question Evaluation. Paper at https://arxiv.org/abs/2406.07545
☆53Jun 27, 2024Updated 2 years ago
WebPAI / ComUICoder
View on GitHub
[SIGKDD 2026] ComUICoder: Component-based Reusable UI Code Generation for Complex Websites via Semantic Segmentation and Element-wise Fee…
☆24Jun 2, 2026Updated last month
szq0214 / Un-Mix
View on GitHub
Un-Mix: Rethinking Image Mixtures for Unsupervised Visual Representation Learning.
☆150Aug 10, 2022Updated 3 years ago
Zicheng-He / PCA-LSTM-in-stock-price-prediction
View on GitHub
Stock Price Prediction with PCA and LSTM
☆14Mar 3, 2021Updated 5 years ago
WebPAI / EfficientUICoder
View on GitHub
[FSE 2026] EfficientUICoder: Efficient MLLM-based UI Code Generation via Input and Output Token Compression
☆26May 5, 2026Updated 2 months ago
showlab / Long-form-Video-Prior
View on GitHub
☆32May 3, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
OpenGVLab / SDLM
View on GitHub
Sequential Diffusion Language Model (SDLM) enhances pre-trained autoregressive language models by adaptively determining generation lengt…
☆98Dec 27, 2025Updated 6 months ago
violet-liang / soundfield-reconstruction-np
View on GitHub
Sound field reconstruction using neural processes with dynamic kernels
☆16Mar 25, 2025Updated last year
WangWenhao0716 / PDF-Embedding
View on GitHub
[NeurIPS 2024] The official implementation of "Image Copy Detection for Diffusion Models"
☆18Oct 1, 2024Updated last year
yuecao0119 / MMInstruct
View on GitHub
[SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…
☆64Nov 7, 2024Updated last year
VeriGUI-Team / VeriWeb
View on GitHub
VeriWeb: Verifiable Long-Chain Web Benchmark for Agentic Information-Seeking
☆88Jan 21, 2026Updated 6 months ago
Hansxsourse / VRMDiff
View on GitHub
☆11Mar 11, 2025Updated last year
lt-asset / Waffle
View on GitHub
For ACL25 paper "WAFFLE: Multi-Modal Model for Automated Front-End Development" - by Shanchao Liang and Nan Jiang and Shangshu Qian and L…
☆12May 28, 2025Updated last year
mikewlcheung / code-in-articles
View on GitHub
Computer code used in articles
☆11Apr 27, 2026Updated 2 months ago
niuzaisheng / ScreenExplorer
View on GitHub
ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World
☆26Jun 17, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
HazyResearch / wonderbread
View on GitHub
WONDERBREAD benchmark + dataset for BPM tasks
☆35Jul 30, 2025Updated 11 months ago
southnx / ACoLP
View on GitHub
Open Set Video HOI detection from Action-centric Chain-of-Look Prompting, ICCV2023
☆12Oct 3, 2023Updated 2 years ago
Somoy73 / Frontend-UI-Element-Detection-and-Classification
View on GitHub
Detection and Classification of UI Elements of Web pages and Apps from Wireframe Sketches
☆10Oct 9, 2023Updated 2 years ago
ZJUSCL / MVP
View on GitHub
Multi-View prediction enhances GUI Grounding
☆21Feb 22, 2026Updated 4 months ago
showlab / ROICtrl
View on GitHub
Code for [CVPR 2025] ROICtrl: Boosting Instance Control for Visual Generation
☆110Apr 16, 2025Updated last year
LaoKuiZe / AppAgent-Pro
View on GitHub
☆16Aug 27, 2025Updated 10 months ago
shaoshitong / G_VBSM_Dataset_Condensation
View on GitHub
[CVPR2024 highlight] Generalized Large-Scale Data Condensation via Various Backbone and Statistical Matching (G-VBSM)
☆27Oct 9, 2024Updated last year
MCG-NJU / RGE
View on GitHub
Reasoning Guided Embeddings: Leveraging MLLM Reasoning for Improved Multimodal Retrieval
☆15Nov 29, 2025Updated 7 months ago
VILA-Lab / SRe2L
View on GitHub
(NeurIPS 2023 spotlight) Large-scale Dataset Distillation/Condensation, 50 IPC (Images Per Class) achieves the highest 60.8% on original …
☆141Nov 15, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
sholokhovalexey / active-noise-control
View on GitHub
Active noise controller (ANC) design: a practical primer
☆15Jan 8, 2026Updated 6 months ago
mbzuai-oryx / VideoMolmo
View on GitHub
Official code of the paper "VideoMolmo: Spatio-Temporal Grounding meets Pointing"
☆56Jul 5, 2025Updated last year
OPPO-Mente-Lab / AndesVL_Evaluation
View on GitHub
☆26Apr 15, 2026Updated 3 months ago
OPPO-Mente-Lab / DaMo
View on GitHub
The official implement of paper 《DaMo: Data Mixing Optimizer in Fine-tuning Multimodal LLMs for Mobile Phone Agents》
☆30Oct 23, 2025Updated 8 months ago
ayiyayi / EgoExoBench
View on GitHub
☆15Nov 13, 2025Updated 8 months ago
VILA-Lab / DRAG
View on GitHub
(ACL 2025 Main) Distilling RAG for SLMs from LLMs to Transfer Knowledge and Mitigate Hallucination via Evidence and Graph-based Distillat…
☆35Aug 23, 2025Updated 10 months ago
PopeyePxx / MKA
View on GitHub
☆21Dec 23, 2025Updated 6 months ago