X-LANCE / META-GUI-baseline
[EMNLP 2022] The baseline code for META-GUI dataset
☆10Updated 2 months ago
Related projects: ⓘ
- ☆11Updated 4 months ago
- This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"☆38Updated 2 months ago
- ☆31Updated 3 months ago
- This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"☆21Updated 2 months ago
- Mobile App Tasks with Iterative Feedback (MoTIF): Addressing Task Feasibility in Interactive Visual Environments☆55Updated last month
- [ACL 2023] Solving Math Word Problems via Cooperative Reasoning induced Language Models☆33Updated 9 months ago
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement☆21Updated last month
- GUICourse: From General Vision Langauge Models to Versatile GUI Agents☆68Updated 2 months ago
- Multi-modal code generation problems.☆15Updated 2 weeks ago
- This the implementation of LeCo☆16Updated 2 months ago
- [ACL2024] Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios☆36Updated 5 months ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆39Updated 3 months ago
- The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism☆23Updated 2 months ago
- my commonly-used tools☆46Updated last month
- Android in the Zoo: Chain-of-Action-Thought for GUI Agents☆32Updated 2 months ago
- ☆49Updated last year
- A curated list of resources about long-context in large-language models and video understanding.☆29Updated last year
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆33Updated 6 months ago
- Visual and Embodied Concepts evaluation benchmark☆21Updated 11 months ago
- [ICML'2024] Can AI Assistants Know What They Don't Know?☆62Updated 7 months ago
- ☆14Updated last week
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆86Updated 3 months ago
- GPT as Human☆17Updated 7 months ago
- Do Large Language Models Know What They Don’t Know?☆84Updated 9 months ago
- Code and data for the ACL 2024 Findings paper "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning"☆23Updated 3 months ago
- Released code for our ICLR23 paper.☆62Updated last year
- Evaluating Mathematical Reasoning Beyond Accuracy☆32Updated 5 months ago
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆54Updated 6 months ago
- Data for evaluating GPT-4V☆11Updated 10 months ago
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆98Updated 6 months ago