uivision / UI-VisionLinks
☆28Updated 5 months ago
Alternatives and similar repositories for UI-Vision
Users that are interested in UI-Vision are comparing it to the libraries listed below
Sorting:
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆147Updated last year
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆61Updated last year
- [TMLR'25] "Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents"☆93Updated 2 months ago
- ☆21Updated 7 months ago
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆86Updated 6 months ago
- [ICLR2025 Spotlight] Agent Trajectory Synthesis via Guiding Replay with Web Tutorials☆45Updated 9 months ago
- Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay☆138Updated 6 months ago
- ☆62Updated last month
- ☆66Updated 6 months ago
- ☆51Updated 10 months ago
- ☆65Updated 9 months ago
- [NeurIPS 2025 Spotlight] Scaling Computer-Use Grounding via UI Decomposition and Synthesis☆134Updated last month
- [ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control☆62Updated 11 months ago
- Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)☆37Updated 11 months ago
- ☆52Updated 7 months ago
- An Illusion of Progress? Assessing the Current State of Web Agents☆124Updated this week
- Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning☆50Updated last month
- Code for "Reasoning to Learn from Latent Thoughts"☆123Updated 8 months ago
- Official Repo for MageBench: Bridging Large Multimodal Models to Agents☆22Updated 11 months ago
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆49Updated last year
- [NeurIPS'25 D&B] Mind2Web-2 Benchmark: Evaluating Agentic Search with Agent-as-a-Judge☆93Updated 2 weeks ago
- ☆58Updated last year
- ☆12Updated last year
- [ACL'25 (Findings)] Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents☆24Updated last month
- ☆114Updated 2 months ago
- ☆20Updated last year
- [NeurIPS 2024] A task generation and model evaluation system for multimodal language models.☆73Updated last year
- ☆118Updated 8 months ago
- [ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning☆69Updated 5 months ago
- [ACL 2025] AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant☆42Updated 11 months ago