YuxiangChai / AMEX-codebase
☆14Updated last week
Related projects: ⓘ
- GUICourse: From General Vision Langauge Models to Versatile GUI Agents☆68Updated 2 months ago
- GUI Odyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes fr…☆57Updated 2 months ago
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆52Updated 2 months ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆39Updated 3 months ago
- Towards Large Multimodal Models as Visual Foundation Agents☆87Updated 3 weeks ago
- The Official Code Repository for GUI-World.☆33Updated last month
- Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)☆92Updated 2 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆36Updated 5 months ago
- ☆46Updated 10 months ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆51Updated 3 months ago
- This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"☆21Updated 2 months ago
- ☆70Updated 6 months ago
- An Easy-to-use Hallucination Detection Framework for LLMs.☆48Updated 4 months ago
- An benchmark for evaluating the capabilities of large vision-language models (LVLMs)☆32Updated 10 months ago
- Official code for Paper "Mantis: Multi-Image Instruction Tuning"☆158Updated last week
- This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"☆38Updated 2 months ago
- Touchstone: Evaluating Vision-Language Models by Language Models☆75Updated 8 months ago
- This is the official implementation of the paper "Needle In A Multimodal Haystack"☆72Updated 2 months ago
- CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs☆66Updated last month
- MATH-Vision dataset and code to measure Multimodal Mathematical Reasoning capabilities.☆53Updated 2 weeks ago
- ☆73Updated 8 months ago
- ☆53Updated 7 months ago
- A curated list of the papers, repositories, tutorials, and anythings related to the large language models for tools☆64Updated last year
- ☆24Updated 7 months ago
- Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs☆21Updated 2 months ago
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement☆21Updated last month
- ☆31Updated 3 months ago
- This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"☆36Updated 2 months ago
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆40Updated 2 months ago
- [ICPRAI 2024] DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents☆16Updated 5 months ago