QQ-MM / PureMM
☆19Updated 6 months ago
Related projects: ⓘ
- Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models☆36Updated last year
- ☆83Updated 9 months ago
- [ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"☆51Updated 11 months ago
- The official GitHub page for ''What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Ins…☆18Updated 10 months ago
- ☆73Updated 8 months ago
- This is the official implementation of the paper "Needle In A Multimodal Haystack"☆72Updated 2 months ago
- Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆44Updated 3 months ago
- An benchmark for evaluating the capabilities of large vision-language models (LVLMs)☆32Updated 10 months ago
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆54Updated last year
- ☆53Updated 7 months ago
- ☆11Updated 2 months ago
- A collection of visual instruction tuning datasets.☆74Updated 6 months ago
- ☆19Updated last month
- The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".☆28Updated last year
- Official repository of MMDU dataset☆61Updated last month
- 🦩 Visual Instruction Tuning with Polite Flamingo - training multi-modal LLMs to be both clever and polite! (AAAI-24 Oral)☆63Updated 9 months ago
- This is the official repo for Contrastive Vision-Language Alignment Makes Efficient Instruction Learner.☆21Updated 9 months ago
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆40Updated 3 months ago
- [EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆67Updated 5 months ago
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆21Updated 3 months ago
- ✨✨ MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?☆65Updated last week
- [ICML 2024] Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning☆41Updated 4 months ago
- ☆31Updated 2 months ago
- Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"☆34Updated 2 months ago
- ☆28Updated 2 weeks ago
- MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria☆49Updated last month
- VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)☆21Updated 2 months ago
- Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization☆58Updated 7 months ago
- ☆52Updated 3 weeks ago
- The official code for paper "EasyGen: Easing Multimodal Generation with a Bidirectional Conditional Diffusion Model and LLMs"☆71Updated 7 months ago