microsoft / GUI-ActorLinks
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
☆309Updated last month
Alternatives and similar repositories for GUI-Actor
Users that are interested in GUI-Actor are comparing it to the libraries listed below
Sorting:
- [ICML2025] Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction☆340Updated 5 months ago
- GUI Grounding for Professional High-Resolution Computer Use☆238Updated last month
- [ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents☆262Updated 3 weeks ago
- [ACL 2025] Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis☆151Updated 3 weeks ago
- PC Agent: While You Sleep, AI Works - A Cognitive Journey into Digital World☆275Updated 2 months ago
- [ICLR 2025] A trinity of environments, tools, and benchmarks for general virtual agents☆214Updated last month
- OS-ATLAS: A Foundation Action Model For Generalist GUI Agents☆363Updated 3 months ago
- ☆288Updated 2 months ago
- ☆88Updated this week
- An open platform for enhancing the capability of LLMs in workflow orchestration.☆160Updated 4 months ago
- A MemAgent framework that can be extrapolated to 3.5M, along with a training framework for RL training of any agent workflow.☆588Updated last week
- ☆228Updated 3 months ago
- Official implementation for "ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization"☆80Updated 2 months ago
- Repo for "VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforce…☆293Updated last month
- Building Open LLM Web Agents with Self-Evolving Online Curriculum RL☆437Updated 2 months ago
- ✨✨Latest Papers and Datasets on Mobile and PC GUI Agent☆131Updated 8 months ago
- 🦀️ CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents. https://crab.camel-ai.org/☆364Updated last month
- Implementation for OAgents: An Empirical Study of Building Effective Agents☆101Updated last week
- The model, data and code for the visual GUI Agent SeeClick☆411Updated 3 weeks ago
- ☆90Updated last week
- Towards Large Multimodal Models as Visual Foundation Agents☆225Updated 3 months ago
- Code for "UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning"☆122Updated 2 months ago
- Open-sourced, Fast and Context-aware Action Grounding from GUI Instructions for GUI/Computer-use Agents☆374Updated 6 months ago
- ☆77Updated 4 months ago
- A LLM-based Agent that predict its tasks proactively.☆402Updated 2 months ago
- SkillWeaver is a framework to enable web agent self-improvement through environment exploration and skill synthesis.☆91Updated 3 months ago
- MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search too…☆272Updated last week
- ReasonFlux Series - A family of LLM post-training algorithms focusing on data selection, reinforcement learning, and inference scaling☆470Updated 2 weeks ago
- ☆82Updated 3 weeks ago
- Efficient Agent Training for Computer Use☆122Updated 2 months ago