RobertBiehl / multimodal-instruct
Instruction tuning dataset generation inspired by LLaVA-Instruct-158k via any LLM, also for commercial use.
☆12Updated 10 months ago
Alternatives and similar repositories for multimodal-instruct:
Users that are interested in multimodal-instruct are comparing it to the libraries listed below
- Code for "Merging Text Transformers from Different Initializations"☆19Updated 5 months ago
- [Under Review] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with enla…☆48Updated 3 months ago
- A huge dataset for Document Visual Question Answering☆15Updated 6 months ago
- (WACV 2025) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi, B…☆81Updated 4 months ago
- CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts☆138Updated 7 months ago
- Clipora is a powerful toolkit for fine-tuning OpenCLIP models using Low Rank Adapters (LoRA).☆19Updated 5 months ago
- Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)☆122Updated 3 months ago
- E5-V: Universal Embeddings with Multimodal Large Language Models☆224Updated last month
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆188Updated 3 weeks ago
- ☆89Updated last year
- Matryoshka Multimodal Models☆93Updated last week
- 【NeurIPS 2024】Dense Connector for MLLMs☆154Updated 3 months ago
- ☆22Updated last year
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆252Updated 7 months ago
- LL3M: Large Language and Multi-Modal Model in Jax☆68Updated 9 months ago
- Code for T-MARS data filtering☆35Updated last year
- MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria☆61Updated 3 months ago
- [NAACL 2024] MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning☆89Updated 3 weeks ago
- ☆132Updated last year
- PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"☆23Updated this week
- [ACL 2024 Findings & ICLR 2024 WS] An Evaluator VLM that is open-source, offers reproducible evaluation, and inexpensive to use. Specific…☆62Updated 4 months ago
- ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆22Updated 3 weeks ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆40Updated 7 months ago
- VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning☆100Updated 4 months ago
- [EMNLP 2024] Official PyTorch implementation code for realizing the technical part of Traversal of Layers (TroL) presenting new propagati…☆89Updated 7 months ago
- PyTorch implementation of "UNIT: Unifying Image and Text Recognition in One Vision Encoder", NeurlPS 2024.☆27Updated 4 months ago
- ☆73Updated 10 months ago
- ☆22Updated 10 months ago
- An LLM-free Multi-dimensional Benchmark for Multi-modal Hallucination Evaluation☆108Updated last year
- Code & Dataset for Paper: "Distill Visual Chart Reasoning Ability from LLMs to MLLMs"☆46Updated 3 months ago