xlang-ai / Spider2-V
[NeurIPS 2024] Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
☆116Updated 5 months ago
Alternatives and similar repositories for Spider2-V:
Users that are interested in Spider2-V are comparing it to the libraries listed below
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Updated 11 months ago
- [COLING 2025] ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios☆65Updated 2 months ago
- [NeurIPS 2024] Agent Planning with World Knowledge Model☆110Updated last month
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆135Updated 3 months ago
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆141Updated 2 months ago
- [ACL 2024] AUTOACT: Automatic Agent Learning from Scratch for QA via Self-Planning☆206Updated last month
- Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"☆212Updated this week
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆125Updated 2 months ago
- ☆209Updated 6 months ago
- ☆48Updated 11 months ago
- [ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following☆119Updated 7 months ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆116Updated 3 months ago
- [NAACL 2024 Outstanding Paper] Source code for the NAACL 2024 paper entitled "R-Tuning: Instructing Large Language Models to Say 'I Don't…☆108Updated 7 months ago
- Conifer: Improving Complex Constrained Instruction-Following Ability of Large Language Models☆84Updated 10 months ago
- This is the code repo for our paper "Autonomously Knowledge Assimilation and Accommodation through Retrieval-Augmented Agents".☆102Updated 3 months ago
- Reformatted Alignment☆114Updated 4 months ago
- ☆120Updated 8 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated last year
- [EMNLP 2024 (Oral)] Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA☆111Updated 3 months ago
- Code for ICLR 2024 paper "CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets"☆51Updated 8 months ago
- ☆98Updated 2 months ago
- ☆129Updated last month
- Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…☆112Updated 8 months ago
- Codes and Data for ACL 2024 Paper "Faithful Logical Reasoning via Symbolic Chain-of-Thought".☆173Updated 7 months ago
- AWM: Agent Workflow Memory☆239Updated 2 weeks ago
- ☆137Updated last year
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆45Updated 2 months ago
- ☆42Updated 8 months ago
- official repo for the paper "Learning From Mistakes Makes LLM Better Reasoner"☆59Updated last year
- Astraios: Parameter-Efficient Instruction Tuning Code Language Models☆57Updated 10 months ago