infly-ai / INF-MLLM
☆54Updated 4 months ago
Alternatives and similar repositories for INF-MLLM:
Users that are interested in INF-MLLM are comparing it to the libraries listed below
- Touchstone: Evaluating Vision-Language Models by Language Models☆80Updated last year
- ☆94Updated last year
- [NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…☆109Updated last month
- ☆42Updated 5 months ago
- SVIT: Scaling up Visual Instruction Tuning☆164Updated 7 months ago
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆252Updated 6 months ago
- ☆87Updated last year
- MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria☆59Updated 3 months ago
- ☆20Updated 10 months ago
- ☆132Updated last year
- A collection of visual instruction tuning datasets.☆76Updated 10 months ago
- The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity". Th…☆42Updated 2 months ago
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆78Updated last month
- This repo contains the code and data for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks"☆110Updated this week
- Official repository of MMDU dataset☆82Updated 3 months ago
- ☆59Updated 11 months ago
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆24Updated last month
- ICML'2024 | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI☆96Updated 6 months ago
- The proposed simulated dataset consisting of 9,536 charts and associated data annotations in CSV format.☆21Updated 10 months ago
- ☆47Updated last year
- ☆73Updated 10 months ago
- The official GitHub page for ''What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Ins…☆18Updated last year
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆43Updated 7 months ago
- Lion: Kindling Vision Intelligence within Large Language Models☆52Updated 11 months ago
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆56Updated last year
- Dataset pruning for ImageNet and LAION-2B.☆70Updated 6 months ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆74Updated this week
- VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation☆84Updated 4 months ago
- Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)☆120Updated 3 months ago
- Empirical Study Towards Building An Effective Multi-Modal Large Language Model☆23Updated last year