TheDuckAI / DuckTrackLinks

Multimodal computer agent data collection program

☆141

Alternatives and similar repositories for DuckTrack

Users that are interested in DuckTrack are comparing it to the libraries listed below

Sorting:

McGill-NLP / weblinx
WebLINX is a benchmark for building web navigation agents with conversational capabilities
☆156Updated 5 months ago
asappresearch / webagents-step
☆41Updated last year
allenai / clin
☆83Updated last year
convergence-ai / webgames
Challenges for general-purpose web-browsing AI agents
☆62Updated 2 months ago
CogNLP / CogAGENT
☆35Updated 2 years ago
xlang-ai / computer-agent-arena
Computer Agent Arena: Test & compare AI agents in real desktop apps & web environments. Code/data coming soon!
☆47Updated 3 months ago
posgnu / rci-agent
A codebase for "Language Models can Solve Computer Tasks"
☆234Updated last year
oriyor / assistantbench
Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"
☆59Updated 7 months ago
gabegrand / lilo
LILO: Library Induction with Language Observations
☆88Updated 11 months ago
magicproduct / hash-hop
Long context evaluation for large language models
☆220Updated 5 months ago
allenai / CommonGen-Eval
Evaluating LLMs with CommonGen-Lite
☆90Updated last year
reka-ai / reka-vibe-eval
Multimodal language model benchmark, featuring challenging examples
☆173Updated 7 months ago
Ag2S1 / Sibyl-System
☆123Updated 11 months ago
ai8hyf / OpenResearchAssistant
An automated tool for discovering insights from research papaer corpora
☆138Updated last year
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆173Updated 6 months ago
kanishkg / stream-of-search
Repository for the paper Stream of Search: Learning to Search in Language
☆149Updated 6 months ago
data-for-agents / insta
Official Repo for InSTA: Towards Internet-Scale Training For Agents
☆52Updated 3 weeks ago
InflectionAI / Inflection-Benchmarks
Public Inflection Benchmarks
☆68Updated last year
Berkeley-NLP / Agent-Eval-Refine
Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]
☆139Updated 8 months ago
SalesforceAIResearch / LaTRO
☆118Updated 5 months ago
commit-0 / commit0
Commit0: Library Generation from Scratch
☆160Updated 2 months ago
google-deepmind / pix2act
☆59Updated last year
agiresearch / Formal-LLM
Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents
☆124Updated last year
JoshuaPurtell / SmallBench
Small, simple agent task environments for training and evaluation
☆18Updated 9 months ago
ltzheng / agent-studio
[ICLR 2025] A trinity of environments, tools, and benchmarks for general virtual agents
☆214Updated last month
Alex-Gurung / ReasoningNCP
Official repo for Learning to Reason for Long-Form Story Generation
☆68Updated 3 months ago
sher222 / LeReT
Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval
☆49Updated 9 months ago
ConsequentAI / fneval
Functional Benchmarks and the Reasoning Gap
☆88Updated 10 months ago
ShengranHu / Thought-Cloning
[NeurIPS '23 Spotlight] Thought Cloning: Learning to Think while Acting by Imitating Human Thinking
☆268Updated last year
lz1oceani / verify_cot
☆134Updated last year