Princeton-AI2-Lab/Web-World-Models

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Princeton-AI2-Lab/Web-World-Models)

Princeton-AI2-Lab / Web-World-Models

Official Project Page for Web World Models (https://arxiv.org/abs/2512.23676)

☆92

Alternatives and similar repositories for Web-World-Models

Users that are interested in Web-World-Models are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yifanzhang-pro / lanser-cli
View on GitHub
[Lanser-CLI] Official Implementation of "Reinforcement Learning from Compiler and Language Server Feedback" (https://arxiv.org/abs/2510.2…
☆18Jun 15, 2026Updated last month
GregxmHu / OccuBench
View on GitHub
OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models
☆21Apr 14, 2026Updated 3 months ago
ChengpengLi1003 / CoRT
View on GitHub
☆72Oct 23, 2025Updated 8 months ago
Aaron617 / text2world
View on GitHub
[ACL 2025 Findings] Text2World: Benchmarking Large Language Models for Symbolic World Model Generation
☆29Feb 25, 2025Updated last year
BryceZhuo / PolyCom
View on GitHub
The official implementation of ICLR 2025 paper "Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models".
☆18Apr 25, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
X1AOX1A / Word2World
View on GitHub
[ACL 2026 Oral] From Word to World: Can Large Language Models be Implicit Text-based World Models?
☆66Apr 13, 2026Updated 3 months ago
llm-in-sandbox / llm-in-sandbox
View on GitHub
Computer Environments Elicit General Agentic Intelligence in LLMs
☆237Updated this week
shiqichen17 / SPA
View on GitHub
Github repository for "Internalizing World Models via Self-Play Finetuning for Agentic RL"
☆36Nov 1, 2025Updated 8 months ago
Amirhosein-gh98 / Gnosis
View on GitHub
Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits
☆46Updated this week
KodCode-AI / code-r1
View on GitHub
Reproducing R1 for Code with Reliable Rewards
☆13Apr 9, 2025Updated last year
Evanwu1125 / LiteCoT
View on GitHub
☆17Jun 10, 2025Updated last year
TemporaryLoRA / FreeLM
View on GitHub
☆15Feb 10, 2026Updated 5 months ago
GreatX3 / ProAct
View on GitHub
ProAct is a framework designed to enable Large Language Model (LLM) agents to perform accurate, multi-turn lookahead reasoning in interac…
☆18Feb 11, 2026Updated 5 months ago
CogComp / TCR
View on GitHub
Temporal and Causal Reasoning (dataset)
☆10Apr 19, 2022Updated 4 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
McGill-NLP / agent-reward-bench
View on GitHub
AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
☆47Aug 7, 2025Updated 11 months ago
OSU-NLP-Group / Explorer
View on GitHub
[ACL'25 (Findings)] Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents
☆29Feb 17, 2026Updated 5 months ago
SalesforceAIResearch / PretrainRL-pipeline
View on GitHub
An automated data pipeline scaling RL to pretraining levels
☆76Jun 2, 2026Updated last month
Frostlinx / SearchEyes
View on GitHub
SearchEyes: Towards Frontier Multimodal Deep Search Intelligence via Search World Simulation. A typed knowledge graph unifies data synthe…
☆20Jul 8, 2026Updated last week
Euphoria16 / UI-Genie
View on GitHub
[NeurIPS 2025] UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents
☆60Nov 27, 2025Updated 7 months ago
cescchen1990 / TsinghuaNet
View on GitHub
清华大学校园网客户端与联网库，适用于命令行环境，Windows、Linux、Mac OS X桌面平台与UWP、iOS、Android移动平台
☆12Mar 3, 2020Updated 6 years ago
OpenGVLab / ZeroGUI
View on GitHub
ZeroGUI: Automating Online GUI Learning at Zero Human Cost
☆119Jul 17, 2025Updated last year
wutaiqiang / MI
View on GitHub
Official code for paper "Revisiting Model Interpolation for Efficient Reasoning"
☆17Jul 14, 2026Updated last week
kyle8581 / WMA-Agents
View on GitHub
Official code repository for "Web Agents with World Models [ICLR 2025]".
☆31Mar 2, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ReCAP-Stanford / ReCAP
View on GitHub
ReCAP: Recursive Context-Aware Reasoning and Planning for Large Language Model Agents, NeurIPS 2025
☆38Nov 15, 2025Updated 8 months ago
bethgelab / delta-belief-rl
View on GitHub
Official implementation of the ΔBelief-RL method.
☆31Feb 28, 2026Updated 4 months ago
thu-coai / SPaR
View on GitHub
☆47Jun 11, 2025Updated last year
Trae1ounG / BuPO
View on GitHub
[arxiv: 2512.19673] Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies
☆60Feb 6, 2026Updated 5 months ago
mll-lab-nu / ENACT
View on GitHub
ENACT is a benchmark that evaluates embodied cognition through world modeling from egocentric interaction. It is designed to be simple an…
☆52Nov 27, 2025Updated 7 months ago
LiangThree / MCMA
View on GitHub
☆15Jan 12, 2026Updated 6 months ago
UbiquantAI / URM
View on GitHub
Universal Reasoning Model
☆134Jan 15, 2026Updated 6 months ago
interactivebench / InteractiveBench
View on GitHub
Official Project Page for Interactive Benchmarks
☆31May 12, 2026Updated 2 months ago
Leey21 / A-Data-Centric-Study
View on GitHub
☆18Mar 2, 2026Updated 4 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
jinzhuoran / RAG-RewardBench
View on GitHub
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment
☆18Dec 19, 2024Updated last year
INK-USC / Reflect
View on GitHub
Data and Code for Paper "Reflect Not Reflex: Inference-Based Common Ground Improves Dialogue Response Quality" (EMNLP 2022)
☆11Nov 28, 2022Updated 3 years ago
Leey21 / CipherBank
View on GitHub
☆13Jun 13, 2025Updated last year
JT-GUIAgent / JT-GUIAgent
View on GitHub
☆17Jul 15, 2025Updated last year
JiayiGeng / CAID
View on GitHub
Code repo for paper: Effective Strategies for Asynchronous Software Engineering Agents
☆64Apr 2, 2026Updated 3 months ago
pzs19 / LEMMA
View on GitHub
☆16Sep 4, 2025Updated 10 months ago
fangjf1 / OpenSafeMLRM
View on GitHub
The first toolkit for MLRM safety evaluation, providing unified interface for mainstream models, datasets, and jailbreaking methods!
☆15Apr 8, 2025Updated last year