asappresearch / webagents-step
☆38Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for webagents-step
- Code for the paper 🌳 Tree Search for Language Model Agents☆138Updated 3 months ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆40Updated last month
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆110Updated 3 weeks ago
- Official Repo for UGround☆97Updated last week
- WebLINX is a benchmark for building web navigation agents with conversational capabilities☆118Updated last month
- [ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control☆51Updated 2 months ago
- ☆116Updated 5 months ago
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆75Updated last month
- ☆22Updated this week
- ☆51Updated 10 months ago
- Repository for the paper Stream of Search: Learning to Search in Language☆91Updated 3 months ago
- Source code for our paper: "SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals".☆65Updated 4 months ago
- An implemtation of Everyting of Thoughts (XoT).☆132Updated 9 months ago
- Can Language Models Solve Olympiad Programming?☆100Updated 3 months ago
- Evaluating tool-augmented LLMs in conversation settings☆72Updated 5 months ago
- Codebase accompanying the Summary of a Haystack paper.☆72Updated 2 months ago
- ☆35Updated last year
- Mixing Language Models with Self-Verification and Meta-Verification☆97Updated last year
- Official repository for paper "GTA: A Benchmark for General Tool Agents" (NeurIPS 2024 D&B Track)☆45Updated 2 weeks ago
- Repository for paper Tools Are Instrumental for Language Agents in Complex Environments☆32Updated last month
- WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?☆127Updated 3 weeks ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆41Updated 9 months ago
- ☆37Updated 3 weeks ago
- ☆103Updated 3 months ago
- Data preparation code for CrystalCoder 7B LLM☆42Updated 6 months ago
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆115Updated last week
- ☆78Updated 11 months ago
- ☆39Updated this week
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆44Updated 10 months ago