infly-ai / INF-MLLM
☆52Updated 3 weeks ago
Related projects: ⓘ
- ☆19Updated 6 months ago
- Touchstone: Evaluating Vision-Language Models by Language Models☆75Updated 8 months ago
- This is the official implementation of the paper "Needle In A Multimodal Haystack"☆72Updated 2 months ago
- MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria☆49Updated last month
- ☆73Updated 8 months ago
- ☆83Updated 9 months ago
- ☆53Updated 7 months ago
- A collection of visual instruction tuning datasets.☆74Updated 6 months ago
- Empirical Study Towards Building An Effective Multi-Modal Large Language Model☆23Updated 10 months ago
- ICML'2024 | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI☆84Updated 2 months ago
- ☆44Updated 2 weeks ago
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆64Updated 2 weeks ago
- SVIT: Scaling up Visual Instruction Tuning☆159Updated 3 months ago
- Official repository of MMDU dataset☆61Updated last month
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆239Updated 2 months ago
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆40Updated 3 months ago
- LVBench: An Extreme Long Video Understanding Benchmark☆51Updated 2 weeks ago
- The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".☆28Updated last year
- An benchmark for evaluating the capabilities of large vision-language models (LVLMs)☆32Updated 10 months ago
- The proposed simulated dataset consisting of 9,536 charts and associated data annotations in CSV format.☆20Updated 6 months ago
- Lion: Kindling Vision Intelligence within Large Language Models☆52Updated 7 months ago
- ☆46Updated 10 months ago
- CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs☆66Updated last month
- Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning☆58Updated 3 months ago
- Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations☆47Updated 2 months ago
- Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"☆143Updated 2 weeks ago
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆54Updated last year
- ✨✨ MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?☆65Updated last week
- Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)☆92Updated 2 months ago
- ☆128Updated 8 months ago