RobertBiehl / multimodal-instruct
Instruction tuning dataset generation inspired by LLaVA-Instruct-158k via any LLM, also for commercial use.
☆12Updated last year
Alternatives and similar repositories for multimodal-instruct:
Users that are interested in multimodal-instruct are comparing it to the libraries listed below
- PyTorch implementation of "UNIT: Unifying Image and Text Recognition in One Vision Encoder", NeurlPS 2024.☆29Updated 7 months ago
- [Under Review] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with enla…☆57Updated 6 months ago
- Code for "Merging Text Transformers from Different Initializations"☆20Updated 2 months ago
- ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆30Updated 3 months ago
- Matryoshka Multimodal Models☆99Updated 3 months ago
- CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts☆147Updated 10 months ago
- Code for "Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free"☆64Updated 6 months ago
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆155Updated 4 months ago
- Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More☆28Updated 2 months ago
- Code & Dataset for Paper: "Distill Visual Chart Reasoning Ability from LLMs to MLLMs"☆53Updated 6 months ago
- Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …☆69Updated 5 months ago
- Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.☆128Updated 3 months ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆154Updated 7 months ago
- ☆133Updated last year
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding☆49Updated last month
- [EMNLP 2024] Official PyTorch implementation code for realizing the technical part of Traversal of Layers (TroL) presenting new propagati…☆96Updated 10 months ago
- Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)☆139Updated 6 months ago
- This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR25]☆198Updated last month
- ☆41Updated 9 months ago
- ☆73Updated last year
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆50Updated 4 months ago
- ☆63Updated last week
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models☆28Updated last year
- E5-V: Universal Embeddings with Multimodal Large Language Models☆243Updated 4 months ago
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts☆319Updated 9 months ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆65Updated 10 months ago
- This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"☆129Updated 10 months ago
- (WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…☆84Updated 2 months ago
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"☆74Updated this week
- [NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs☆108Updated this week