Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding
☆53Dec 12, 2024Updated last year
Alternatives and similar repositories for MultiUI
Users that are interested in MultiUI are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents☆305Mar 11, 2026Updated 3 weeks ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆65Oct 19, 2024Updated last year
- Interface for GenAI-Arena [NeurIPS24]☆17Feb 27, 2024Updated 2 years ago
- Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)☆37Dec 29, 2024Updated last year
- [ACL'25 (Findings)] Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents☆27Feb 17, 2026Updated last month
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- OS-ATLAS: A Foundation Action Model For Generalist GUI Agents☆441Apr 20, 2025Updated 11 months ago
- ☆16Feb 12, 2026Updated last month
- Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents, CVPR 2025☆25Jan 25, 2025Updated last year
- [NeurIPS 2024] Federated Learning from Vision-Language Foundation Models: Theoretical Analysis and Method☆15Oct 1, 2024Updated last year
- ☆20Apr 24, 2024Updated last year
- GuessWhat?! is a challenging task-oriented visual dialogue problem.<br>Tensorflow code for the papers, <Visual Dialogue State Tracking f…☆11May 16, 2024Updated last year
- Seamless Voice Interactions with LLMs☆12Oct 28, 2023Updated 2 years ago
- Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.☆32Feb 26, 2025Updated last year
- Landing page + leaderboard for SWE-Bench benchmark☆12Mar 29, 2026Updated last week
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- ☆16May 22, 2022Updated 3 years ago
- GUI Grounding for Professional High-Resolution Computer Use☆357Mar 4, 2026Updated last month
- We introduce OpenStory++, a large-scale open-domain dataset focusing on enabling MLLMs to perform storytelling generation tasks.☆17Aug 30, 2024Updated last year
- A benchmark that focuses on the sampling dilemma in long-video tasks. Through well-designed tasks, it evaluates the sampling efficiency o…☆27Aug 7, 2025Updated 8 months ago
- Code for WALT – Web Agents that Learn Tools☆67Oct 30, 2025Updated 5 months ago
- Code repo for the paper: Attacking Vision-Language Computer Agents via Pop-ups☆51Dec 23, 2024Updated last year
- (ICLR 2025) The Official Code Repository for GUI-World.☆69Dec 18, 2024Updated last year
- [NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs☆144Apr 22, 2025Updated 11 months ago
- Data and code for the ICLR 2023 paper "Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning".☆165Dec 27, 2023Updated 2 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- [NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos☆51Feb 22, 2026Updated last month
- [ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning☆30Mar 5, 2024Updated 2 years ago
- ☆29Feb 27, 2025Updated last year
- Code for ACL 21: Generating Query Focused Summaries from Query-Free Resources☆33Jul 15, 2022Updated 3 years ago
- ☆64Nov 28, 2022Updated 3 years ago
- This is the official repository of the paper "Atomic-to-Compositional Generalization for Mobile Agents with A New Benchmark and Schedulin…☆13Jul 27, 2025Updated 8 months ago
- GUICourse: From General Vision Langauge Models to Versatile GUI Agents☆140Mar 1, 2026Updated last month
- R1-like Computer-use Agent☆89Mar 21, 2025Updated last year
- Dialogue Graph Modeling for Conversational Machine Reading (ACL 2021, Findings)☆18Nov 29, 2022Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- The code repository for "Wings: Learning Multimodal LLMs without Text-only Forgetting" [NeurIPS 2024]☆27Dec 28, 2024Updated last year
- ☆50Jun 7, 2025Updated 10 months ago
- Detecting Drift in a Diabetes Dataset using Taipy☆12May 19, 2025Updated 10 months ago
- Notebook on computer science, linguistics, deep learning, opinions, and life. Making it bilingual (en/zh).☆14Feb 13, 2026Updated last month
- DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling☆36Jul 12, 2024Updated last year
- ScreenSuite - The most comprehensive benchmarking suite for GUI Agents!☆143Apr 2, 2026Updated last week
- [ICLR'25] "Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers"☆42Mar 31, 2025Updated last year