neulab / MultiUILinks

Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding

☆52

Alternatives and similar repositories for MultiUI

Users that are interested in MultiUI are comparing it to the libraries listed below

Sorting:

shulin16 / MMInA
[ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents
☆46Updated 5 months ago
VisualWebBench / VisualWebBench
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
☆58Updated 9 months ago
xlang-ai / OSWorld-G
Scaling Computer-Use Grounding via UI Decomposition and Synthesis
☆91Updated last month
EternityYW / Gemini-Commonsense-Evaluation
Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"
☆36Updated last year
cyzus / thoughtsculpt
☆13Updated 7 months ago
MetaStone-AI / MetaStone-S1
The open-source code of MetaStone-S1.
☆83Updated 2 weeks ago
du-nlp-lab / MLR-Copilot
☆66Updated 4 months ago
yihedeng9 / OpenVLThinker
OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement
☆102Updated last week
dvlab-research / ARPO
Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay
☆99Updated 2 months ago
tianyi-lab / C3PO
Code for "C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing"
☆16Updated 3 months ago
dinobby / MAgICoRE
☆24Updated 10 months ago
JieyuZ2 / TaskMeAnything
[NeurIPS 2024] A task generation and model evaluation system for multimodal language models.
☆72Updated 8 months ago
vis-nlp / ChartGemma
☆66Updated last year
hewei2001 / ReachQA
Code & Dataset for Paper: "Distill Visual Chart Reasoning Ability from LLMs to MLLMs"
☆53Updated 9 months ago
MBZUAI-LLM / web2code
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
☆86Updated 9 months ago
jihaonew / MM-Instruct
MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
☆35Updated last year
dvlab-research / MR-GSM8K
Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs
☆50Updated last year
DAMO-NLP-SG / multimodal_textbook
[ICCV 2025 Highlight] The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"
☆167Updated 4 months ago
reka-ai / reka-vibe-eval
Multimodal language model benchmark, featuring challenging examples
☆173Updated 7 months ago
bespokelabsai / verifiers
Verifiers for LLM Reinforcement Learning
☆68Updated 3 months ago
agents-x-project / PyVision
Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."
☆103Updated last week
TIGER-AI-Lab / VisualWebInstruct
The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search"
☆26Updated 2 months ago
Dongping-Chen / GUI-World
(ICLR 2025) The Official Code Repository for GUI-World.
☆61Updated 7 months ago
zengxingchen / ChartQA-MLLM
[IEEE VIS 2024] LLaVA-Chart: Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruc…
☆68Updated 6 months ago
GAIR-NLP / PC-Agent-E
Efficient Agent Training for Computer Use
☆120Updated last month
Yan98 / GTA1
☆75Updated 2 weeks ago
eric-ai-lab / Screen-Point-and-Read
Code repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"
☆28Updated last year
si0wang / ThinkLite-VL
☆87Updated last month
allenai / pixmo-docs
ACL 2025: Synthetic data generation pipelines for text-rich images.
☆107Updated 5 months ago
declare-lab / LLM-PuzzleTest
This repository is maintained to release dataset and models for multimodal puzzle reasoning.
☆99Updated 5 months ago