asappresearch / webagents-stepLinks

☆41

Alternatives and similar repositories for webagents-step

Users that are interested in webagents-step are comparing it to the libraries listed below

Sorting:

oriyor / assistantbench
Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"
☆59Updated 7 months ago
McGill-NLP / weblinx
WebLINX is a benchmark for building web navigation agents with conversational capabilities
☆156Updated 5 months ago
kohjingyu / search-agents
Code for the paper 🌳 Tree Search for Language Model Agents
☆208Updated last year
data-for-agents / insta
Official Repo for InSTA: Towards Internet-Scale Training For Agents
☆52Updated 3 weeks ago
google-deepmind / pix2act
☆59Updated last year
microsoft / simulated-trial-and-error
☆122Updated last year
kanishkg / stream-of-search
Repository for the paper Stream of Search: Learning to Search in Language
☆149Updated 6 months ago
TheDuckAI / DuckTrack
Multimodal computer agent data collection program
☆141Updated last year
allenai / clin
☆83Updated last year
Berkeley-NLP / Agent-Eval-Refine
Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]
☆139Updated 8 months ago
Ag2S1 / Sibyl-System
☆123Updated 11 months ago
InternLM / SWE-Fixer
☆108Updated 2 months ago
SalesforceAIResearch / LaTRO
☆118Updated 5 months ago
bespokelabsai / verifiers
Verifiers for LLM Reinforcement Learning
☆68Updated 3 months ago
sher222 / LeReT
Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval
☆49Updated 9 months ago
AlexCuadron / ThinkingAgent
Systematic evaluation framework that automatically rates overthinking behavior in large language models.
☆91Updated 2 months ago
ServiceNow / WorkArena
WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?
☆200Updated this week
SalesforceAIResearch / swecomm
☆27Updated 6 months ago
automix-llm / automix
Mixing Language Models with Self-Verification and Meta-Verification
☆105Updated 7 months ago
ConsequentAI / fneval
Functional Benchmarks and the Reasoning Gap
☆88Updated 10 months ago
SalesforceAIResearch / CodeTree
Code for the paper: CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models
☆24Updated 4 months ago
archiki / ADaPT
Official code for the paper "ADaPT: As-Needed Decomposition and Planning with Language Models"
☆87Updated last year
yueqis / API-Based-Agent
☆54Updated last month
agiresearch / Formal-LLM
Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents
☆125Updated last year
jlin816 / dialop
DialOp: Decision-oriented dialogue environments for collaborative language agents
☆109Updated 8 months ago
Agent-E3 / ExACT
☆20Updated 4 months ago
ltzheng / Synapse
[ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control
☆59Updated 6 months ago
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆173Updated 6 months ago
SALT-NLP / demonstrated-feedback
☆125Updated 10 months ago
MLE-Dojo / MLE-Dojo
☆61Updated last week