TheDuckAI / DuckTrack
Multimodal computer agent data collection program
☆122Updated last year
Alternatives and similar repositories for DuckTrack:
Users that are interested in DuckTrack are comparing it to the libraries listed below
- WebLINX is a benchmark for building web navigation agents with conversational capabilities☆144Updated last month
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆52Updated 3 months ago
- Repository for the paper Stream of Search: Learning to Search in Language☆139Updated last month
- Evaluating LLMs with CommonGen-Lite☆89Updated 11 months ago
- ☆39Updated 7 months ago
- An automated tool for discovering insights from research papaer corpora☆137Updated 9 months ago
- LLM based agents with proactive interactions, long-term memory, external tool integration, and local deployment capabilities.☆98Updated this week
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆129Updated 3 months ago
- ☆36Updated last year
- Multimodal language model benchmark, featuring challenging examples☆158Updated 2 months ago
- Official code for the paper "ADaPT: As-Needed Decomposition and Planning with Language Models"☆74Updated last year
- An AI agent for interacting with a computer using the graphical user interface☆76Updated last year
- Framework and toolkits for building and evaluating collaborative agents that can work together with humans.☆68Updated 3 weeks ago
- A codebase for "Language Models can Solve Computer Tasks"☆233Updated 10 months ago
- Mixing Language Models with Self-Verification and Meta-Verification☆102Updated 3 months ago
- A set of utilities for running few-shot prompting experiments on large-language models☆118Updated last year
- Just a bunch of benchmark logs for different LLMs☆119Updated 7 months ago
- Public Inflection Benchmarks☆68Updated last year
- Code for the paper 🌳 Tree Search for Language Model Agents☆184Updated 7 months ago
- ☆81Updated last year
- Small, simple agent task environments for training and evaluation☆18Updated 4 months ago
- ☆54Updated last year
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898☆209Updated 10 months ago
- My implementation of "Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models"☆98Updated last year
- ☆51Updated 7 months ago
- [ICLR 2025] A trinity of environments, tools, and benchmarks for general virtual agents☆195Updated 2 weeks ago
- LILO: Library Induction with Language Observations☆83Updated 6 months ago
- ☆114Updated 7 months ago
- Code repository for the c-BTM paper☆105Updated last year
- Commit0: Library Generation from Scratch☆129Updated this week