microsoft / VisEval
☆24Updated last month
Related projects ⓘ
Alternatives and complementary repositories for VisEval
- ☆54Updated 6 months ago
- ☆51Updated 4 months ago
- Code for paper Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding☆45Updated 5 months ago
- Vega-Lite Chart Dataset and NL Generation Framework using LLMs☆102Updated 5 months ago
- Code for Our EMNLP (Industry) 2023 paper "LLM4Vis: Explainable Visualization Recommendation using ChatGPT"☆22Updated 9 months ago
- A collection of works that investigate social agents, simulations and their real-world impact in text, embodied, and robotics contexts.☆63Updated 5 months ago
- SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation☆14Updated last month
- ☆35Updated 5 months ago
- [ICML 2024 Oral] A framework for society simulation that supports complex simulation, for example: multi-scene.☆52Updated 3 months ago
- Towards Large Multimodal Models as Visual Foundation Agents☆122Updated last week
- VisText is a benchmark dataset for semantically rich chart captioning.☆83Updated last year
- The Official Code Repository for GUI-World.☆41Updated 3 months ago
- ncNet is a Transformer-based model for supporting NL2VIS.☆36Updated 2 months ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆99Updated 3 weeks ago
- [NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs☆75Updated last month
- ☆85Updated 6 months ago
- A Comprehensive Benchmark for Software Development.☆84Updated 5 months ago
- ☆37Updated 2 months ago
- Official Implementation of Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization☆111Updated 6 months ago
- ☆89Updated 3 months ago
- Code for the 2024 arXiv publication "Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Mo…☆22Updated 4 months ago
- Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows☆156Updated last week
- [NeurIPS 2024 D&B Track] GTA: A Benchmark for General Tool Agents☆46Updated 2 weeks ago
- Official repository for Decentralized Arena via Collective LLM Intelligence☆8Updated last month
- CHI 2021 Paper Website☆10Updated 3 years ago
- ☆70Updated 3 years ago
- A LLM-based Agent that predict its tasks proactively.☆16Updated 2 months ago
- Code and Data for "MIRAI: Evaluating LLM Agents for Event Forecasting"☆55Updated 4 months ago
- ☆17Updated 4 months ago
- ☆17Updated 4 months ago