RUCBM / GUICourse
GUICourse: From General Vision Langauge Models to Versatile GUI Agents
☆92Updated 5 months ago
Alternatives and similar repositories for GUICourse:
Users that are interested in GUICourse are comparing it to the libraries listed below
- Official implementation for "Android in the Zoo: Chain-of-Action-Thought for GUI Agents" (Findings of EMNLP 2024)☆59Updated 2 months ago
- The Official Code Repository for GUI-World.☆44Updated 4 months ago
- GUI Odyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes fr…☆76Updated last month
- Towards Large Multimodal Models as Visual Foundation Agents☆142Updated 3 weeks ago
- ☆25Updated 2 months ago
- Official Repo for UGround☆116Updated last month
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆47Updated last month
- The model, data and code for the visual GUI Agent SeeClick☆248Updated 3 weeks ago
- [NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs☆84Updated last month
- [NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…☆105Updated 3 weeks ago
- ☆75Updated 9 months ago
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆99Updated 9 months ago
- This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"☆43Updated last month
- A Universal Platform for Training and Evaluation of Mobile Interaction☆39Updated last month
- ☆87Updated 11 months ago
- [ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control☆54Updated 3 months ago
- MATH-Vision dataset and code to measure Multimodal Mathematical Reasoning capabilities.☆73Updated 2 months ago
- ☆58Updated 10 months ago
- Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)☆115Updated 2 months ago
- A Survey on Benchmarks of Multimodal Large Language Models☆72Updated 2 months ago
- Touchstone: Evaluating Vision-Language Models by Language Models☆79Updated 10 months ago
- Official implementation for "You Only Look at Screens: Multimodal Chain-of-Action Agents" (Findings of ACL 2024)☆206Updated 5 months ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆61Updated 6 months ago
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆72Updated 5 months ago
- ☆47Updated 6 months ago
- A Self-Training Framework for Vision-Language Reasoning☆40Updated last month
- This is the official repository of the paper "OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI"☆89Updated this week
- This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"☆28Updated 5 months ago
- [NAACL 2024] MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning☆87Updated 2 months ago
- An Easy-to-use Hallucination Detection Framework for LLMs.☆48Updated 7 months ago