zwq2018 / Multi-modal-Self-instruct
The codebase for our paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model
☆37Updated last month
Related projects: ⓘ
- ☆46Updated 10 months ago
- ☆32Updated 3 months ago
- ☆19Updated last month
- VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)☆21Updated 2 months ago
- Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆44Updated 3 weeks ago
- ☆31Updated 3 months ago
- An benchmark for evaluating the capabilities of large vision-language models (LVLMs)☆32Updated 10 months ago
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆40Updated 2 months ago
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆54Updated last year
- VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆73Updated 2 months ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆51Updated 3 months ago
- ☆16Updated this week
- Efficient Multi-modal Models via Stage-wise Visual Context Compression☆34Updated last month
- ☆83Updated 9 months ago
- The official GitHub page for ''What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Ins…☆18Updated 10 months ago
- MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆35Updated this week
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆40Updated 3 months ago
- Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆51Updated last year
- This is the official implementation of the paper "Needle In A Multimodal Haystack"☆72Updated 2 months ago
- Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"☆22Updated last month
- ☆20Updated 4 months ago
- Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆52Updated 5 months ago
- Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs☆21Updated 2 months ago
- ☆20Updated 9 months ago
- [EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding☆44Updated 8 months ago
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆42Updated 9 months ago
- Official Dataloader and Evaluation Scripts for LongVideoBench.☆52Updated last month
- MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment☆30Updated 2 months ago
- VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation☆84Updated last week