asappresearch / webagents-stepLinks
β40Updated 10 months ago
Alternatives and similar repositories for webagents-step
Users that are interested in webagents-step are comparing it to the libraries listed below
Sorting:
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"β56Updated 5 months ago
- Code for the paper π³ Tree Search for Language Model Agentsβ199Updated 10 months ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]β136Updated 6 months ago
- Official Repo for InSTA: Towards Internet-Scale Training For Agentsβ42Updated this week
- β114Updated 3 months ago
- Scaling Computer-Use Grounding via UI Decomposition and Synthesisβ49Updated this week
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignmentβ57Updated 9 months ago
- β56Updated last year
- WebLINX is a benchmark for building web navigation agents with conversational capabilitiesβ148Updated 3 months ago
- InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srwβ62Updated 8 months ago
- Repository for the paper Stream of Search: Learning to Search in Languageβ146Updated 4 months ago
- Mixing Language Models with Self-Verification and Meta-Verificationβ104Updated 5 months ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"β47Updated last year
- Verifiers for LLM Reinforcement Learningβ55Updated last month
- β121Updated 11 months ago
- Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrievalβ34Updated 7 months ago
- Functional Benchmarks and the Reasoning Gapβ86Updated 8 months ago
- Codebase accompanying the Summary of a Haystack paper.β78Updated 8 months ago
- Scalable Meta-Evaluation of LLMs as Evaluatorsβ42Updated last year
- β82Updated last year
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMsβ54Updated last year
- Framework and toolkits for building and evaluating collaborative agents that can work together with humans.β81Updated last month
- LangCode - Improving alignment and reasoning of large language models (LLMs) with natural language embedded program (NLEP).β42Updated last year
- Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agentsβ123Updated 11 months ago
- WONDERBREAD benchmark + dataset for BPM tasksβ24Updated 7 months ago
- Systematic evaluation framework that automatically rates overthinking behavior in large language models.β89Updated 2 weeks ago
- Code for the paper: CodeTree: Agent-guided Tree Search for Code Generation with Large Language Modelsβ21Updated 2 months ago
- GΓΆdel Agent: A Self-Referential Agent Framework for Recursive Self-Improvementβ92Updated 3 months ago
- β120Updated 9 months ago
- β120Updated 8 months ago