xbmxb / CoCo-Agent
☆20Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for CoCo-Agent
- GUI Odyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes fr…☆65Updated 4 months ago
- ☆19Updated last month
- GUICourse: From General Vision Langauge Models to Versatile GUI Agents☆80Updated 3 months ago
- The Official Code Repository for GUI-World.☆37Updated 3 months ago
- Towards Large Multimodal Models as Visual Foundation Agents☆114Updated last week
- Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"☆37Updated 4 months ago
- Official repository for paper "GTA: A Benchmark for General Tool Agents" (NeurIPS 2024 D&B Track)☆43Updated this week
- A curated list of the papers, repositories, tutorials, and anythings related to the large language models for tools☆64Updated last year
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆42Updated 2 weeks ago
- Official implementation for "Android in the Zoo: Chain-of-Action-Thought for GUI Agents" (Findings of EMNLP 2024)☆42Updated 3 weeks ago
- ☆12Updated 6 months ago
- ☆53Updated 7 months ago
- The official code for paper "EasyGen: Easing Multimodal Generation with a Bidirectional Conditional Diffusion Model and LLMs"☆72Updated 8 months ago
- ☆34Updated 10 months ago
- Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"☆24Updated last week
- Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …☆35Updated last month
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆37Updated 6 months ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆96Updated last week
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆83Updated 3 weeks ago
- [2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation☆48Updated last year
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆26Updated 3 weeks ago
- Touchstone: Evaluating Vision-Language Models by Language Models☆77Updated 9 months ago
- ☆29Updated 2 weeks ago
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆100Updated 7 months ago
- ☆45Updated last year
- ☆37Updated 5 months ago
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆41Updated 4 months ago
- DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Models☆56Updated 2 weeks ago
- This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"☆26Updated 4 months ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆46Updated 3 weeks ago