OpenGVLab / GUI-Odyssey
GUI Odyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes from 6 mobile devices, spanning 6 types of cross-app tasks, 201 apps, and 1.4K app combos.
☆69Updated last week
Related projects ⓘ
Alternatives and complementary repositories for GUI-Odyssey
- ☆20Updated last month
- GUICourse: From General Vision Langauge Models to Versatile GUI Agents☆83Updated 4 months ago
- Official repository of MMDU dataset☆75Updated last month
- Official implementation for "Android in the Zoo: Chain-of-Action-Thought for GUI Agents" (Findings of EMNLP 2024)☆48Updated last month
- ☆58Updated 9 months ago
- ☆73Updated 8 months ago
- [NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…☆102Updated 3 weeks ago
- ☆85Updated 10 months ago
- Touchstone: Evaluating Vision-Language Models by Language Models☆78Updated 10 months ago
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆67Updated 4 months ago
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆26Updated 4 months ago
- ☆131Updated 10 months ago
- ICML'2024 | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI☆95Updated 4 months ago
- MATH-Vision dataset and code to measure Multimodal Mathematical Reasoning capabilities.☆69Updated last month
- An Easy-to-use Hallucination Detection Framework for LLMs.☆48Updated 7 months ago
- [NAACL 2024] MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning☆84Updated last month
- The Official Code Repository for GUI-World.☆41Updated 3 months ago
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆25Updated last week
- A Survey on Benchmarks of Multimodal Large Language Models☆64Updated last month
- ☆121Updated 3 weeks ago
- An benchmark for evaluating the capabilities of large vision-language models (LVLMs)☆33Updated last year
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆38Updated 4 months ago
- Towards Large Multimodal Models as Visual Foundation Agents☆120Updated this week
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆89Updated last week
- Official code for Paper "Mantis: Multi-Image Instruction Tuning" (TMLR2024)☆184Updated this week
- Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"☆162Updated 2 months ago
- MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models☆19Updated 2 months ago
- VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)☆22Updated 4 months ago
- [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback☆235Updated 2 months ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆47Updated last month