TheDuckAI / DuckTrack
Multimodal computer agent data collection program
☆129Updated last year
Alternatives and similar repositories for DuckTrack
Users that are interested in DuckTrack are comparing it to the libraries listed below
Sorting:
- WebLINX is a benchmark for building web navigation agents with conversational capabilities☆146Updated 3 months ago
- ☆40Updated 9 months ago
- Official code for the paper "ADaPT: As-Needed Decomposition and Planning with Language Models"☆78Updated last year
- ☆37Updated 2 years ago
- Public Inflection Benchmarks☆68Updated last year
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆135Updated 5 months ago
- Evaluating LLMs with CommonGen-Lite☆90Updated last year
- ☆51Updated 9 months ago
- ☆82Updated last year
- LILO: Library Induction with Language Observations☆86Updated 8 months ago
- an implementation of Self-Extend, to expand the context window via grouped attention☆119Updated last year
- Just a bunch of benchmark logs for different LLMs☆119Updated 9 months ago
- Repository for the paper Stream of Search: Learning to Search in Language☆145Updated 3 months ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆54Updated 5 months ago
- Official repo for Learning to Reason for Long-Form Story Generation☆44Updated 3 weeks ago
- Computer Agent Arena: Test & compare AI agents in real desktop apps & web environments. Code/data coming soon!☆44Updated last month
- An automated tool for discovering insights from research papaer corpora☆138Updated 11 months ago
- An AI agent for interacting with a computer using the graphical user interface☆77Updated last year
- A set of utilities for running few-shot prompting experiments on large-language models☆120Updated last year
- Multimodal language model benchmark, featuring challenging examples☆167Updated 4 months ago
- Commit0: Library Generation from Scratch☆144Updated last month
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆51Updated 5 months ago
- A lightweight script for processing HTML page to markdown format with support for code blocks☆79Updated last year
- [ICLR 2025] A trinity of environments, tools, and benchmarks for general virtual agents☆201Updated 3 weeks ago
- Small, simple agent task environments for training and evaluation☆18Updated 6 months ago
- ☆114Updated 2 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆172Updated 3 months ago
- Code and data for the paper "Why think step by step? Reasoning emerges from the locality of experience"☆60Updated last month
- Camel-Coder: Collaborative task completion with multiple agents. Role-based prompts, intervention mechanism, and thoughtful suggestions☆33Updated last year
- The Next Generation Multi-Modality Superintelligence☆71Updated 8 months ago