Multimodal computer agent data collection program
☆164Dec 5, 2025Updated 2 months ago
Alternatives and similar repositories for DuckTrack
Users that are interested in DuckTrack are comparing it to the libraries listed below
Sorting:
- A Universal Platform for Training and Evaluation of Mobile Interaction☆60Sep 24, 2025Updated 5 months ago
- WONDERBREAD benchmark + dataset for BPM tasks☆34Jul 30, 2025Updated 7 months ago
- Radiantloom Email Assist 7B is an email-assistant large language model fine-tuned from Zephyr-7B-Beta, over a custom-curated dataset of 1…☆14Jan 19, 2024Updated 2 years ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆148Nov 26, 2024Updated last year
- ☆41Jul 21, 2024Updated last year
- Cerule - A Tiny Mighty Vision Model☆68Nov 9, 2025Updated 3 months ago
- Syphus: Automatic Instruction-Response Generation Pipeline☆14Dec 14, 2023Updated 2 years ago
- ☆16Apr 9, 2021Updated 4 years ago
- Self-hosted GPT-4V api☆27Nov 6, 2023Updated 2 years ago
- [NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments☆2,608Updated this week
- Platform to run interactive Reinforcement Learning agents in a Minecraft Server☆56Feb 20, 2026Updated last week
- [ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents☆300Jul 18, 2025Updated 7 months ago
- ☆20Jan 27, 2024Updated 2 years ago
- ☆20Jan 7, 2024Updated 2 years ago
- VisualWebArena is a benchmark for multimodal agents.☆440Nov 9, 2024Updated last year
- ☆150Jan 4, 2024Updated 2 years ago
- OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents☆21Jan 6, 2026Updated last month
- 🌎💪 BrowserGym, a Gym environment for web task automation☆1,136Feb 10, 2026Updated 2 weeks ago
- ☆20Apr 24, 2024Updated last year
- Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay☆148May 29, 2025Updated 9 months ago
- Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"☆1,353Nov 26, 2025Updated 3 months ago
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents☆48Feb 27, 2025Updated last year
- Display tensors directly from GPU☆11Oct 12, 2025Updated 4 months ago
- Simple program to manually caption your images (or any other file types) so you can use them for AI training☆37Mar 20, 2023Updated 2 years ago
- OS-ATLAS: A Foundation Action Model For Generalist GUI Agents☆437Apr 20, 2025Updated 10 months ago
- A collection of my tools related to notetaking☆10Apr 18, 2021Updated 4 years ago
- ☆11Oct 13, 2023Updated 2 years ago
- ☆13Mar 22, 2023Updated 2 years ago
- Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.☆161Apr 3, 2024Updated last year
- Extract full next-token probabilities via language model APIs☆248Feb 23, 2024Updated 2 years ago
- ☆35Aug 16, 2024Updated last year
- A Sparse-tensor Communication Framework for Distributed Deep Learning☆13Nov 1, 2021Updated 4 years ago
- Latent Large Language Models☆19Aug 24, 2024Updated last year
- Official implementations for paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models☆15Jan 3, 2024Updated 2 years ago
- POC integration Airbyte+Dagster+Langchain☆13Jun 1, 2023Updated 2 years ago
- ☆34Sep 10, 2024Updated last year
- Microsoft's open source max-min fair solver for cluster scheduling and traffic engineering☆18Feb 11, 2026Updated 2 weeks ago
- A codebase for "Language Models can Solve Computer Tasks"☆240May 1, 2024Updated last year
- [ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large mult…☆826Feb 3, 2025Updated last year