Letian2003 / MM_INFLinks

An efficient multi-modal instruction-following data synthesis tool and the official implementation of Oasis https://arxiv.org/abs/2503.08741.

☆32

Alternatives and similar repositories for MM_INF

Users that are interested in MM_INF are comparing it to the libraries listed below

Sorting:

opendatalab / MLLM-DataEngine
MLLM-DataEngine: An Iterative Refinement Approach for MLLM
☆48Updated last year
deepglint / RealSyn
[ACM MM2025] The official repository for the RealSyn dataset
☆37Updated 4 months ago
BytedanceDouyinContent / SAIL-VL2
The SAIL-VL2 series model developed by the BytedanceDouyinContent Group
☆75Updated 2 months ago
harrytea / TGDoc
"Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs" 2023
☆16Updated 11 months ago
AdamRain / YFCC15M_downloader
A subset of YFCC100M. Tools, checking scripts and links of web drive to download datasets(uncompressed).
☆20Updated last year
mightyzau / InfMLLM
☆19Updated last year
Kwai-YuanQi / TaskGalaxy
Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types
☆32Updated 4 months ago
scenarios / WeMM
☆87Updated last year
alibaba / conv-llava
☆123Updated last year
palchenli / VL-Instruction-Tuning
☆91Updated last year
SparksJoe / Prism
A Framework for Decoupling and Assessing the Capabilities of VLMs
☆43Updated last year
FudanNLPLAB / MouSi
☆75Updated last year
leolee99 / Online-CNCLIP
ChineseCLIP using online learning
☆13Updated 3 years ago
mynameischaos / Lion
Lion: Kindling Vision Intelligence within Large Language Models
☆51Updated last year
PCIResearch / TransCore-M
Large Multimodal Model
☆15Updated last year
huizhang0110 / catvision
A multimodal large-scale model, which performs close to the closed-source Qwen-VL-PLUS on many datasets and significantly surpasses the p…
☆14Updated last year
deepglint / MVT
Margin-based Vision Transformer
☆55Updated last month
kxfan2002 / SophiaVL-R1
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
☆86Updated 3 months ago
lucasjinreal / MLLM_Factory
A Dead Simple and Modularized Multi-Modal Training and Finetune Framework. Compatible to any LLaVA/Flamingo/QwenVL/MiniGemini etc series …
☆19Updated last year
BAAI-DCAI / DataOptim
A collection of visual instruction tuning datasets.
☆76Updated last year
OpenThinkIMG / OpenThinkIMG
OpenThinkIMG is an end-to-end open-source framework that empowers Large Vision-Language Models to think with images.
☆98Updated 4 months ago
360CVGroup / 360VL
Our 2nd-gen LMM
☆34Updated last year
huggingface / docmatix
A huge dataset for Document Visual Question Answering
☆20Updated last year
yuyq96 / TextHawk
Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
☆64Updated last year
yangjie-cv / WeThink
WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning
☆35Updated 5 months ago
bzluan / TextCoT
The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.
☆44Updated last year
Alpha-Innovator / SimChart9K
The proposed simulated dataset consisting of 9,536 charts and associated data annotations in CSV format.
☆26Updated last year
GaryGuTC / UniME-v2
[AAAI 2026 Oral] The official code of "UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning"
☆40Updated last week
OpenGVLab / LCL
[NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
☆70Updated 9 months ago
bytedance / MTVQA
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. A comprehensive evaluation of multimodal large model multilingua…
☆64Updated 6 months ago