mnluzimu/WebGen-Bench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/mnluzimu/WebGen-Bench)

mnluzimu / WebGen-Bench

☆54

Alternatives and similar repositories for WebGen-Bench

Users that are interested in WebGen-Bench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

mnluzimu / WebGen-Agent
View on GitHub
☆21Jul 10, 2026Updated 2 weeks ago
mathllm / Step-Controlled_DPO
View on GitHub
☆23Jul 5, 2024Updated 2 years ago
mathllm / VoiceAssistant-Eval
View on GitHub
A rigorous framework for evaluating and guiding the development of next-generation AI assistants.
☆19Jan 26, 2026Updated 5 months ago
Euphoria16 / UI-Genie
View on GitHub
[NeurIPS 2025] UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents
☆60Nov 27, 2025Updated 7 months ago
thomasjoshi / agents-never-forget
View on GitHub
☆18May 18, 2025Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
Evanwu1125 / AutoWebWorld
View on GitHub
☆25Jul 10, 2026Updated 2 weeks ago
liyongqi67 / LTRGR
View on GitHub
☆21Aug 9, 2024Updated last year
McGill-NLP / agent-reward-bench
View on GitHub
AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
☆48Aug 7, 2025Updated 11 months ago
lm-playpen / playpen
View on GitHub
All you need to get started with the LM Playpen Environment for Learning in Interaction.
☆16Jun 22, 2026Updated last month
MathGenie / MathGenie
View on GitHub
☆14Mar 11, 2024Updated 2 years ago
bigcode-project / bigcodearena
View on GitHub
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution
☆61Oct 13, 2025Updated 9 months ago
AGI-Eval-Official / PRDBench
View on GitHub
☆43May 29, 2026Updated last month
google-research-datasets / uicrit
View on GitHub
UICrit is a dataset containing human-generated natural language design critiques, corresponding bounding boxes for each critique, and des…
☆27Nov 19, 2024Updated last year
microsoft / RepoLaunch
View on GitHub
Automate the build, execution and test of GitHub repositories across programming languages and operating systems.
☆127Jun 16, 2026Updated last month
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
NJU-LINK / WebCompass
View on GitHub
The Source Code for WebCompass
☆21May 2, 2026Updated 2 months ago
Leey21 / A-Data-Centric-Study
View on GitHub
☆18Mar 2, 2026Updated 4 months ago
SALT-NLP / Sketch2Code
View on GitHub
Code for the paper: Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design Prototyping
☆41Oct 29, 2024Updated last year
LG-AI-EXAONE / KMMLU-Pro
View on GitHub
☆16Aug 18, 2025Updated 11 months ago
WebPAI / Interaction2Code
View on GitHub
[ASE 2025] Benchmarking MLLM-based Interactive Webpage Code Generation from Interactive Prototyping
☆59Jun 6, 2026Updated last month
allenai / multimodalqa
View on GitHub
☆158Oct 12, 2022Updated 3 years ago
THUDM / SWE-Dev
View on GitHub
[ACL25' Findings] SWE-Dev is an SWE agent with a scalable test case construction pipeline.
☆64Jul 21, 2025Updated last year
chang-github-00 / Predictive-Decoding
View on GitHub
Repo for Anonymous purpose, pls don't distribute
☆10Oct 2, 2024Updated last year
MBZUAI-LLM / web2code
View on GitHub
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
☆103Oct 23, 2024Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
jsksxs360 / event-coref-emnlp2022
View on GitHub
a within-document event coreference resolution system, trained and evaluated on the KBP corpus.
☆10May 15, 2023Updated 3 years ago
mathllm / MATH-V
View on GitHub
[NeurIPS 2024] MATH-Vision dataset and code to measure multimodal mathematical reasoning capabilities.
☆140May 16, 2025Updated last year
scaleapi / SWE-Atlas
View on GitHub
open source SWE-Atlas
☆57Updated this week
nju-websoft / HuggingBench
View on GitHub
[SIGIR 2025] Benchmarking Recommendation, Classification, and Tracing Based on Hugging Face Knowledge Graph
☆17Jun 6, 2025Updated last year
eltociear / MolCA
View on GitHub
Code for EMNLP2023 paper "MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter".
☆12Dec 27, 2023Updated 2 years ago
s2e-lab / Code-Smell-Code-Generation
View on GitHub
Source code for "An Empirical Study of Code Smells in Transformer-based Code Generation Techniques".
☆11Oct 4, 2022Updated 3 years ago
JSJeong-me / GPT-Table
View on GitHub
GPT Table Semantic Parsing with complex & non-intuitive structure.
☆17Jul 16, 2025Updated last year
vaguenebula / AlpacaDataReflect
View on GitHub
An experiment to see if chatgpt can improve the output of the stanford alpaca dataset
☆12Mar 29, 2023Updated 3 years ago
MasterVito / SvS
View on GitHub
Official Repo for SvS: A Self-play with Variational Problem Synthesis strategy for RLVR training
☆54Dec 13, 2025Updated 7 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
XuZhao0 / Model-Selection-Reasoning
View on GitHub
Model Selection with Large Language Models for Reasoning (EMNLP2023 Findings)
☆30Dec 23, 2023Updated 2 years ago
zhisbug / ray-scalable-ml-design
View on GitHub
Some microbenchmarks and design docs before commencement
☆11Feb 1, 2021Updated 5 years ago
HaoWeiHe / Knowledge-Graph
View on GitHub
how to build up Knowledge graph
☆13Nov 16, 2021Updated 4 years ago
AmenRa / a-multi-domain-benchmark-for-personalized-search-evaluation
View on GitHub
A Multi-domain Benchmark for Personalized Search Evaluation
☆12Sep 7, 2023Updated 2 years ago
OpenLMLab / ChatZoo
View on GitHub
Light local website for displaying performances from different chat models.
☆86Nov 13, 2023Updated 2 years ago
DevoAllen / Awesome-Reasoning-Economy-Papers
View on GitHub
Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models
☆124Oct 16, 2025Updated 9 months ago
castorini / nuggetizer
View on GitHub
☆28Apr 19, 2026Updated 3 months ago