boyugou / llava_uground
☆14Updated 3 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for llava_uground
- Official Repo for UGround☆100Updated 2 weeks ago
- The model, data and code for the visual GUI Agent SeeClick☆227Updated this week
- This is a collection of resources for computer-use agents, including videos, blogs, papers, and projects.☆105Updated 2 weeks ago
- GUICourse: From General Vision Langauge Models to Versatile GUI Agents☆84Updated 4 months ago
- VisualWebArena is a benchmark for multimodal agents.☆246Updated 2 weeks ago
- OS-ATLAS: A Foundation Action Model For Generalist GUI Agents☆173Updated this week
- WebLINX is a benchmark for building web navigation agents with conversational capabilities☆118Updated last month
- Environments, tools, and benchmarks for general computer agents☆172Updated last month
- AndroidWorld is an environment and benchmark for autonomous agents☆137Updated this week
- Code for the paper 🌳 Tree Search for Language Model Agents☆140Updated 3 months ago
- 💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.☆258Updated this week
- [IEEE VIS 2024] LLaVA-Chart: Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruc…☆50Updated last month
- Towards Large Multimodal Models as Visual Foundation Agents☆123Updated last week
- Official implementation for "You Only Look at Screens: Multimodal Chain-of-Action Agents" (Findings of ACL 2024)☆198Updated 4 months ago
- CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents. https://crab.camel-ai.org/☆192Updated last week
- Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models☆124Updated 3 weeks ago
- Official repo for paper DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning.☆265Updated last month
- ☆21Updated last month
- AWM: Agent Workflow Memory☆210Updated last month
- The Official Code Repository for GUI-World.☆41Updated 3 months ago
- ☆78Updated 11 months ago
- Building Open LLM Web Agents with Self-Evolving Online Curriculum RL☆213Updated last week
- ☆38Updated 4 months ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆97Updated last month
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆39Updated last month
- ☆51Updated 10 months ago
- Official code for the paper "ADaPT: As-Needed Decomposition and Planning with Language Models"☆72Updated 10 months ago
- GPT-4V in Wonderland: LMMs as Smartphone Agents☆128Updated 4 months ago
- Building a comprehensive and handy list of papers for GUI agents☆34Updated this week
- ScreenQA dataset was introduced in the "ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots" paper. It contains ~86K …☆91Updated 4 months ago