OSU-NLP-Group/Mind2Web-2

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/OSU-NLP-Group/Mind2Web-2)

OSU-NLP-Group / Mind2Web-2

[NeurIPS'25 D&B] Mind2Web-2 Benchmark: Evaluating Agentic Search with Agent-as-a-Judge

☆111

Alternatives and similar repositories for Mind2Web-2

Users that are interested in Mind2Web-2 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

language-agent-tutorial / language-agent-tutorial.github.io
View on GitHub
[EMNLP 2024 Tutorial] Language Agents: Foundations, Prospects, and Risks
☆10Nov 27, 2024Updated last year
shulin16 / MMInA
View on GitHub
[ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents
☆54Feb 27, 2025Updated last year
OSU-NLP-Group / Explorer
View on GitHub
[ACL'25 (Findings)] Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents
☆29Feb 17, 2026Updated 5 months ago
OSU-NLP-Group / GUI-Agents-Paper-List
View on GitHub
Awesome GUI Agent Paper List
☆862Jun 28, 2026Updated 3 weeks ago
OSU-NLP-Group / Middleware
View on GitHub
Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)
☆37Dec 29, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
WebChoreArena / WebChoreArena
View on GitHub
COLM2026
☆36Jul 9, 2026Updated 2 weeks ago
RenzeLou / Muffin
View on GitHub
MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction-Following
☆16Oct 31, 2024Updated last year
Ayanami0730 / deep_research_bench
View on GitHub
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
☆795May 11, 2026Updated 2 months ago
OSU-NLP-Group / EarlyExperience
View on GitHub
[ICML'26] "Agent Learning via Early Experience" (Open-source reproduction)
☆130Jul 8, 2026Updated 2 weeks ago
OSU-NLP-Group / In-Context-Reranking
View on GitHub
[ICLR'25] "Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers"
☆44Mar 31, 2025Updated last year
OSU-NLP-Group / WebDreamer
View on GitHub
[TMLR'25] "Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents"
☆104Oct 5, 2025Updated 9 months ago
web-arena-x / visualwebarena
View on GitHub
VisualWebArena is a benchmark for multimodal agents.
☆484Nov 9, 2024Updated last year
OSU-NLP-Group / ScienceAgentBench
View on GitHub
[ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery
☆149Updated this week
neulab / agent-data-protocol
View on GitHub
☆187Jul 14, 2026Updated last week
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
dki-lab / ArcaneQA
View on GitHub
☆23Aug 14, 2023Updated 2 years ago
OSU-NLP-Group / UGround
View on GitHub
[ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents
☆315Mar 11, 2026Updated 4 months ago
Di-viner / LLM-Robustness-to-Irrelevant-Information
View on GitHub
[COLM'24] How Easily do Irrelevant Inputs Skew the Responses of Large Language Models?
☆23Oct 13, 2024Updated last year
Hannibal046 / GPT-OSS-BrowseCompPlus-Eval
View on GitHub
Evaluating GPT-OSS on BrowseComp-Plus with Native Browsering Tools
☆20Oct 17, 2025Updated 9 months ago
OSU-NLP-Group / QUEST
View on GitHub
"QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks"
☆232Jul 16, 2026Updated last week
TEAM-ARM / arm
View on GitHub
[NeurIPS'25 Spotlight] ARM: Adaptive Reasoning Model
☆68Apr 6, 2026Updated 3 months ago
OSU-NLP-Group / llm-planning-eval
View on GitHub
[ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"
☆54Feb 23, 2024Updated 2 years ago
BAI-LAB / MoE-CL
View on GitHub
[WWW 2026 Oral] MoE-CL:Self-Evolving LLMs via Continual Instruction Tuning
☆21Dec 1, 2025Updated 7 months ago
GAIR-NLP / ResearcherBench
View on GitHub
ResearcherBench: Evaluating Deep AI Research Systems on the Frontiers of Scientific Inquiry
☆51Jan 5, 2026Updated 6 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
ai-agents-2030 / ViMo
View on GitHub
☆26Apr 2, 2026Updated 3 months ago
OSU-NLP-Group / ACuRL
View on GitHub
An Autonomous Curriculum Reinforcement Learning framework that steers agents to continually learn in specific environments with zero huma…
☆38Jun 7, 2026Updated last month
web-arena-x / webarena
View on GitHub
Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"
☆1,552Nov 26, 2025Updated 7 months ago
Tongyi-MAI / MobileWorld
View on GitHub
Benchmarking Autonomous Mobile Agents in Agent-User Interactive and MCP-Augmented Environments (ACL 2026)
☆242Jul 2, 2026Updated 3 weeks ago
hkust-nlp / WebExplorer
View on GitHub
The official repo of "WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents"
☆120Sep 29, 2025Updated 9 months ago
dki-lab / few-shot-bioIE
View on GitHub
True Few-Shot BioIE: Benchmarking GPT-3 In-Context and Small PLM Fine-Tuning
☆12Jul 6, 2022Updated 4 years ago
OSU-NLP-Group / SeeAct
View on GitHub
[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large mult…
☆851Feb 3, 2025Updated last year
OSU-NLP-Group / AgentSafety
View on GitHub
☆192Oct 31, 2025Updated 8 months ago
OSU-NLP-Group / reversal-curse-binding
View on GitHub
☆25Apr 3, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
yueqis / API-Based-Agent
View on GitHub
☆69Jun 27, 2025Updated last year
TIGER-AI-Lab / BrowserAgent
View on GitHub
BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions [TMLR2025]
☆34Jan 13, 2026Updated 6 months ago
magicgh / Self-MAP
View on GitHub
[ACL 2024] On the Multi-turn Instruction Following for Conversational Web Agents
☆16Oct 12, 2024Updated last year
OSU-NLP-Group / LLM-Knowledge-Conflict
View on GitHub
[ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"
☆84Apr 12, 2024Updated 2 years ago
ServiceNow / BrowserGym
View on GitHub
🌎💪 BrowserGym, a Gym environment for web task automation
☆1,284Updated this week
ZJU-REAL / InftyThink-Plus
View on GitHub
[ICML 2026] InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning
☆33May 25, 2026Updated last month
jinzhuoran / RAG-RewardBench
View on GitHub
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment
☆18Dec 19, 2024Updated last year