Agent-E3 / ExACT
☆21Updated 2 months ago
Alternatives and similar repositories for ExACT:
Users that are interested in ExACT are comparing it to the libraries listed below
- [ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control☆54Updated last week
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆107Updated last month
- ☆48Updated last month
- The Official Code Repository for GUI-World.☆44Updated last month
- ☆29Updated this week
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆66Updated 2 weeks ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆88Updated 3 months ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆110Updated 2 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆40Updated 9 months ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆47Updated 3 months ago
- [ACL 2024] The project of Symbol-LLM☆46Updated 6 months ago
- ☆14Updated 8 months ago
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models☆54Updated last month
- [NeurIPS 2024] Agent Planning with World Knowledge Model☆102Updated last month
- Benchmarking Agentic Workflow Generation☆36Updated last month
- ☆28Updated 3 months ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆46Updated last year
- WONDERBREAD benchmark + dataset for BPM tasks☆23Updated 3 months ago
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆106Updated 8 months ago
- GUICourse: From General Vision Langauge Models to Versatile GUI Agents☆95Updated 6 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆44Updated 3 weeks ago
- Towards Large Multimodal Models as Visual Foundation Agents☆160Updated 3 weeks ago
- ☆20Updated 7 months ago
- Code repo for the paper: Attacking Vision-Language Computer Agents via Pop-ups☆24Updated 3 weeks ago
- [ACL2024] Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios☆50Updated 9 months ago
- Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆57Updated this week
- Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs☆22Updated 3 months ago
- SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights☆45Updated 3 months ago
- This is the official repository of the paper "OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI"☆90Updated last month
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆44Updated last month