amazon-science / AgentOccamLinks
AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents
☆27Updated 5 months ago
Alternatives and similar repositories for AgentOccam
Users that are interested in AgentOccam are comparing it to the libraries listed below
Sorting:
- Code for ICLR 2024 paper "CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets"☆57Updated last year
- Framework and toolkits for building and evaluating collaborative agents that can work together with humans.☆88Updated 3 months ago
- Mind2Web-2 Benchmark: Evaluating Agentic Search with Agent-as-a-Judge☆56Updated this week
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆146Updated 8 months ago
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆106Updated 4 months ago
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆225Updated 2 months ago
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples☆101Updated last month
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Updated last year
- ☆234Updated 11 months ago
- Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)☆37Updated 6 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆103Updated 2 months ago
- A banchmark list for evaluation of large language models.☆130Updated 2 weeks ago
- ☆38Updated 4 months ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆138Updated 7 months ago
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)☆60Updated 9 months ago
- MPO: Boosting LLM Agents with Meta Plan Optimization☆63Updated 4 months ago
- "Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents"☆78Updated 3 months ago
- InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks (ICML 2024)☆138Updated last month
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆82Updated last month
- ☆102Updated 7 months ago
- [ICLR 2024] MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use☆89Updated last year
- DSBench: How Far are Data Science Agents from Becoming Data Science Experts?☆58Updated 4 months ago
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆115Updated 8 months ago
- augmented LLM with self reflection☆129Updated last year
- [ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use☆151Updated last year
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆226Updated 2 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated last year
- Source code for our paper: "Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction A…☆45Updated last year
- The official repo for the code and data of paper SMART☆28Updated 4 months ago
- Official Implementation of Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization☆154Updated last year