Halluminate/WebBench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Halluminate/WebBench)

Halluminate / WebBench

📚 Benchmark your browser agent on ~2.5k READ and ACTION based tasks

☆98

Alternatives and similar repositories for WebBench

Users that are interested in WebBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

convergence-ai / webgames
View on GitHub
Challenges for general-purpose web-browsing AI agents
☆68Jun 2, 2025Updated last year
xiaoyuxin1002 / UQ-PLM
View on GitHub
Uncertainty Quantification with Pre-trained Language Models: An Empirical Analysis
☆15Oct 11, 2022Updated 3 years ago
Halluminate / westworld
View on GitHub
☆19Mar 7, 2026Updated 4 months ago
browser-use / agent-studio
View on GitHub
☆26Jul 31, 2025Updated 11 months ago
TingchenFu / PersonaKGC
View on GitHub
☆28Mar 12, 2022Updated 4 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
xlang-ai / OSWorld-G
View on GitHub
[NeurIPS 2025 Spotlight] Scaling Computer-Use Grounding via UI Decomposition and Synthesis
☆172Jun 18, 2026Updated last month
nottelabs / browserarena
View on GitHub
The Browser Arena
☆28Updated this week
YXB-NKU / SE-GUI
View on GitHub
[NeurIPS 2025]"Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning"
☆108Oct 21, 2025Updated 9 months ago
ServiceNow / BrowserGym
View on GitHub
🌎💪 BrowserGym, a Gym environment for web task automation
☆1,296Jul 17, 2026Updated last week
hud-evals / hud-python
View on GitHub
RL environments + evals for AI agents. Define once, train anything.
☆279Updated this week
OSU-NLP-Group / Online-Mind2Web
View on GitHub
An Illusion of Progress? Assessing the Current State of Web Agents
☆192Jun 25, 2026Updated last month
agentsea / osuniverse
View on GitHub
Benchmark of complex, multimodal desktop-oriented tasks for advanced GUI-navigation AI agents
☆24May 7, 2025Updated last year
michaelfeil / candle-flash-attn-v3
View on GitHub
☆15Dec 21, 2025Updated 7 months ago
zhulishe / Quantitative-investment
View on GitHub
Use strategy in stock transaction for high revenue.
☆10Dec 24, 2015Updated 10 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
hrwise-nlp / ToolsMeetLLMs
View on GitHub
☆33May 8, 2025Updated last year
zhudotexe / fanoutqa
View on GitHub
Companion code for FanOutQA: Multi-Hop, Multi-Document Question Answering for Large Language Models (ACL 2024)
☆62Apr 3, 2026Updated 3 months ago
shawnsihyunlee / simulatedtom
View on GitHub
Public repository for "Think Twice: Perspective-Taking Improves Large Language Models’ Theory-of-Mind Capabilities".
☆25Aug 16, 2023Updated 2 years ago
c-oberle / clone-detection-tools
View on GitHub
Overview of Clone Detection Tools for Java
☆14Aug 23, 2025Updated 11 months ago
s2e-lab / Code-Smell-Code-Generation
View on GitHub
Source code for "An Empirical Study of Code Smells in Transformer-based Code Generation Techniques".
☆11Oct 4, 2022Updated 3 years ago
JSJeong-me / GPT-Table
View on GitHub
GPT Table Semantic Parsing with complex & non-intuitive structure.
☆17Jul 16, 2025Updated last year
hrwise-nlp / AppBench
View on GitHub
This is for EMNLP 2024 Paper: AppBench: Planning of Multiple APIs from Various APPs for Complex User Instruction
☆16Nov 4, 2024Updated last year
Venkat2811 / myelon
View on GitHub
Ultra-low-latency, high-throughput multiprocess transport over SHM and mmap. LMAX-Disruptor-style cross-process ring substrate.
☆17Updated this week
OSU-NLP-Group / SeeActChromeExtension
View on GitHub
☆18Jan 3, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
ajyl / mech_int_othelloGPT
View on GitHub
☆10Nov 6, 2024Updated last year
micheletufano / DeepMutation
View on GitHub
☆12Mar 24, 2023Updated 3 years ago
princeton-pli / hal-harness
View on GitHub
☆311Jul 1, 2026Updated 3 weeks ago
wssun / PromptCS
View on GitHub
A Prompt Learning Framework for Source Code Summarization
☆14Dec 26, 2023Updated 2 years ago
likaixin2000 / ScreenSpot-Pro-GUI-Grounding
View on GitHub
GUI Grounding for Professional High-Resolution Computer Use
☆383Jun 17, 2026Updated last month
AgentbaseHQ / Portal
View on GitHub
Portal: GUI Tools for Agents
☆25Sep 18, 2025Updated 10 months ago
ernie-research / Tool-Augmented-Reward-Model
View on GitHub
[ICLR'24 spotlight] Tool-Augmented Reward Modeling
☆54Jun 6, 2025Updated last year
MinorJerry / WebVoyager
View on GitHub
Code for "WebVoyager: WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models"
☆1,110Mar 4, 2024Updated 2 years ago
facebookresearch / tce
View on GitHub
Library for the Test-based Calibration Error (TCE) metric to quantify the degree to classifier calibration.
☆14Sep 15, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
web-arena-x / visualwebarena
View on GitHub
VisualWebArena is a benchmark for multimodal agents.
☆484Nov 9, 2024Updated last year
metehan777 / awesome-workflow-use-ideas
View on GitHub
Awesome workflow-use conceptual steps & prompts
☆26May 19, 2025Updated last year
teilomillet / retrain
View on GitHub
a Python library that uses Reinforcement Learning (RL) to train LLMs.
☆43Jul 12, 2026Updated 2 weeks ago
ictnlp / AIH
View on GitHub
Code for Findings of ACL 2021 paper "Addressing Inquiries about History: An Efficient and Practical Framework for Evaluating Open-domain …
☆19Dec 16, 2022Updated 3 years ago
XiaojuanTang / Mars
View on GitHub
a benchmark to evaluate the situated inductive reasoning
☆16Jan 7, 2025Updated last year
hccngu / Meta-SN
View on GitHub
☆11May 23, 2023Updated 3 years ago
VeriGUI-Team / VeriWeb
View on GitHub
VeriWeb: Verifiable Long-Chain Web Benchmark for Agentic Information-Seeking
☆88Jan 21, 2026Updated 6 months ago