DAMO-NLP-SG / multimodal_textbook
The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"
☆97Updated this week
Alternatives and similar repositories for multimodal_textbook:
Users that are interested in multimodal_textbook are comparing it to the libraries listed below
- The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…☆67Updated last month
- [NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…☆108Updated last month
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆127Updated last month
- Official implement of MIA-DPO☆48Updated 2 months ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆41Updated this week
- Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models☆126Updated 3 weeks ago
- MATH-Vision dataset and code to measure Multimodal Mathematical Reasoning capabilities.☆77Updated 3 months ago
- Official repository of MMDU dataset☆79Updated 3 months ago
- A Survey on Benchmarks of Multimodal Large Language Models☆76Updated last week
- LVBench: An Extreme Long Video Understanding Benchmark☆70Updated 4 months ago
- ☆47Updated this week
- ☆47Updated last year
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆75Updated 6 months ago
- Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆58Updated 6 months ago
- Code & Dataset for Paper: "Distill Visual Chart Reasoning Ability from LLMs to MLLMs"☆44Updated 2 months ago
- ☆74Updated 10 months ago
- Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …☆47Updated last month
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆62Updated 7 months ago
- A bug-free and improved implementation of LLaVA-UHD, based on the code from the official repo☆32Updated 4 months ago
- ☆92Updated last year
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆146Updated 2 weeks ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆40Updated 6 months ago
- [NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs☆93Updated 2 weeks ago
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆38Updated 5 months ago
- ☆42Updated 5 months ago
- ☆59Updated 11 months ago
- MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment☆31Updated 6 months ago
- ICML'2024 | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI☆96Updated 5 months ago
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆70Updated 2 months ago
- This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"☆28Updated 5 months ago