xlang-ai / Spider2-VView external linksLinks
[NeurIPS 2024] Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
☆136Aug 26, 2024Updated last year
Alternatives and similar repositories for Spider2-V
Users that are interested in Spider2-V are comparing it to the libraries listed below
Sorting:
- ☆15Jul 9, 2025Updated 7 months ago
- Extending context length of visual language models☆12Dec 18, 2024Updated last year
- [ICLR 2025 Oral] Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows☆726Jan 30, 2026Updated 2 weeks ago
- Code for the paper "Decomposing the Enigma: Subgoal-based Demonstration Learning for Formal Theorem Proving"☆19May 25, 2023Updated 2 years ago
- ☆25Aug 23, 2024Updated last year
- [ICLR2025 Spotlight] Agent Trajectory Synthesis via Guiding Replay with Web Tutorials☆50Feb 21, 2025Updated 11 months ago
- ☆54Aug 25, 2023Updated 2 years ago
- This repository contains data, code and models for contextual noncompliance.☆25Jul 18, 2024Updated last year
- code for the table-based open domain question answering project, with paper title: "Reasoning over Hybrid Chain for Table-and-Text Open D…☆12Sep 16, 2022Updated 3 years ago
- ☆61Nov 18, 2024Updated last year
- Web-grounded natural language instructions☆18Nov 25, 2024Updated last year
- Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"☆44Apr 3, 2025Updated 10 months ago
- [ICML2025] Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction☆379Mar 7, 2025Updated 11 months ago
- Dream-VL and Dream-VLA, a diffusion VLM and a diffusion VLA.☆104Jan 14, 2026Updated last month
- WONDERBREAD benchmark + dataset for BPM tasks☆34Jul 30, 2025Updated 6 months ago
- Bridging the Generalization Gap in Text-to-SQL Parsing with Schema Expansion☆14Jul 26, 2023Updated 2 years ago
- [EMNLP'23] Code for Generating Data for Symbolic Language with Large Language Models☆18Oct 21, 2023Updated 2 years ago
- [ICLR 2024] Lemur: Open Foundation Models for Language Agents☆555Oct 28, 2023Updated 2 years ago
- [NeurIPS 2025 Spotlight] Scaling Computer-Use Grounding via UI Decomposition and Synthesis☆148Nov 6, 2025Updated 3 months ago
- [NAACL 2024] A Synthetic, Scalable and Systematic Evaluation Suite for Large Language Models☆33Jun 10, 2024Updated last year
- [ICLR 2025] BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval☆189Sep 13, 2025Updated 5 months ago
- OpenReivew Submission Visualization (ICLR 2024/2025)☆154Oct 17, 2024Updated last year
- ☆33Jun 24, 2024Updated last year
- ☆19Jun 13, 2024Updated last year
- The project page for "SCITAB: A Challenging Benchmark for Compositional Reasoning and Claim Verification on Scientific Tables"☆23Dec 21, 2023Updated 2 years ago
- ☆27Jul 23, 2025Updated 6 months ago
- Code for "[COLM'25] RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing"☆22Mar 18, 2025Updated 10 months ago
- [ICLR'25] Data and code for our paper "Why Does the Effective Context Length of LLMs Fall Short?"☆78Nov 25, 2024Updated last year
- [ICLR 2023] Code for the paper "Binding Language Models in Symbolic Languages"☆325Aug 25, 2023Updated 2 years ago
- [NAACL 2025] Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs☆24Sep 26, 2024Updated last year
- ☆18Sep 5, 2024Updated last year
- ☆21May 24, 2024Updated last year
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆64Jul 8, 2024Updated last year
- InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks (ICML 2024)☆181May 29, 2025Updated 8 months ago
- GUICourse: From General Vision Langauge Models to Versatile GUI Agents☆136Jul 17, 2024Updated last year
- Kernel Playground - A playground to run large scale experiments on the Linux Kernel☆17Nov 8, 2025Updated 3 months ago
- A framework for human-readable prompt-based method with large language models. Specially designed for researchers. (Deprecated, check out…☆131Feb 25, 2023Updated 2 years ago
- Paper collection on building and evaluating language model agents via executable language grounding☆364Apr 29, 2024Updated last year
- ☆69Dec 15, 2024Updated last year