RobertBiehl / multimodal-instruct

Instruction tuning dataset generation inspired by LLaVA-Instruct-158k via any LLM, also for commercial use.

☆12

Alternatives and similar repositories for multimodal-instruct:

Users that are interested in multimodal-instruct are comparing it to the libraries listed below

yeezhu / UNIT
PyTorch implementation of "UNIT: Unifying Image and Text Recognition in One Vision Encoder", NeurlPS 2024.
☆29Updated 7 months ago
ByungKwanLee / Phantom
[Under Review] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with enla…
☆57Updated 6 months ago
nverma1 / merging-text-transformers
Code for "Merging Text Transformers from Different Initializations"
☆20Updated 2 months ago
om-ai-lab / ZoomEye
ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
☆30Updated 3 months ago
mu-cai / matryoshka-mm
Matryoshka Multimodal Models
☆99Updated 3 months ago
SHI-Labs / CuMo
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
☆147Updated 10 months ago
tianyi-lab / MoE-Embedding
Code for "Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free"
☆64Updated 6 months ago
yfzhang114 / SliME
✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
☆155Updated 4 months ago
scitix / MEAP
Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More
☆28Updated 2 months ago
hewei2001 / ReachQA
Code & Dataset for Paper: "Distill Visual Chart Reasoning Ability from LLMs to MLLMs"
☆53Updated 6 months ago
IDEA-FinAI / RagVL
Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …
☆69Updated 5 months ago
EvolvingLMMs-Lab / multimodal-sae
Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.
☆128Updated 3 months ago
TempleX98 / MoVA
[NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context
☆154Updated 7 months ago
X2FD / LVIS-INSTRUCT4V
☆133Updated last year
thunlp / DeepPerception
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
☆49Updated last month
ByungKwanLee / TroL
[EMNLP 2024] Official PyTorch implementation code for realizing the technical part of Traversal of Layers (TroL) presenting new propagati…
☆96Updated 10 months ago
TIGER-AI-Lab / UniIR
Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)
☆139Updated 6 months ago
TIGER-AI-Lab / VLM2Vec
This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR25]
☆198Updated last month
apple / ml-rpm-bench
☆41Updated 9 months ago
FudanNLPLAB / MouSi
☆73Updated last year
neulab / MultiUI
Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding
☆50Updated 4 months ago
si0wang / ThinkLite-VL
☆63Updated last week
DCDmllm / HyperLLaVA
Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
☆28Updated last year
kongds / E5-V
E5-V: Universal Embeddings with Multimodal Large Language Models
☆243Updated 4 months ago
WisconsinAIVision / ViP-LLaVA
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
☆319Updated 9 months ago
yihedeng9 / STIC
Enhancing Large Vision Language Models with Self-Training on Image Comprehension.
☆65Updated 10 months ago
UCSC-VLAA / Recap-DataComp-1B
This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"
☆129Updated 10 months ago
mbzuai-oryx / PALO
(WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…
☆84Updated 2 months ago
TIGER-AI-Lab / VL-Rethinker
The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"
☆74Updated this week
princeton-nlp / CharXiv
[NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
☆108Updated this week