OS-Copilot / OS-Atlas
OS-ATLAS: A Foundation Action Model For Generalist GUI Agents
☆133Updated this week
Related projects ⓘ
Alternatives and complementary repositories for OS-Atlas
- Official Repo for UGround☆93Updated this week
- AWM: Agent Workflow Memory☆203Updated last month
- Environments, tools, and benchmarks for general computer agents☆172Updated 2 weeks ago
- Building Open LLM Web Agents with Self-Evolving Online Curriculum RL☆166Updated this week
- WebLINX is a benchmark for building web navigation agents with conversational capabilities☆115Updated last month
- ☆116Updated 5 months ago
- CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents. https://crab.camel-ai.org/☆187Updated this week
- This is a collection of resources for computer-use agents, including videos, blogs, papers, and projects.☆85Updated this week
- Code for the paper 🌳 Tree Search for Language Model Agents☆138Updated 3 months ago
- Code for the EMNLP 2024 paper "Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps"☆109Updated 2 months ago
- 🤠 Agent-as-a-Judge and DevAI dataset☆184Updated last week
- Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models☆122Updated 2 weeks ago
- ☆11Updated last week
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆37Updated 3 weeks ago
- VisualWebArena is a benchmark for multimodal agents.☆236Updated this week
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆106Updated 2 weeks ago
- ☆128Updated last week
- Official repo for paper DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning.☆254Updated last month
- 💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.☆178Updated 2 weeks ago
- ☆102Updated 2 months ago
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆75Updated 3 weeks ago
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆114Updated this week
- ☆35Updated last year
- ☆76Updated 10 months ago
- ☆283Updated last month
- This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for E…☆353Updated this week
- The Official Code Repository for GUI-World.☆37Updated 3 months ago
- The model, data and code for the visual GUI Agent SeeClick☆216Updated 2 months ago
- Code for Husky, an open-source language agent that solves complex, multi-step reasoning tasks. Husky v1 addresses numerical, tabular and …☆328Updated 4 months ago
- ☆101Updated last month