jinbo0906/Awesome-MLLM-Datasets

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/jinbo0906/Awesome-MLLM-Datasets)

jinbo0906 / Awesome-MLLM-Datasets

This project aims to collect and collate various datasets for multimodal large model training, including but not limited to pre-training data, instruction fine-tuning data, and In-Context learning data.

☆78

Alternatives and similar repositories for Awesome-MLLM-Datasets

Users that are interested in Awesome-MLLM-Datasets are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

LHL3341 / ContextBLIP
View on GitHub
ContextBLIP : Doubly Contextual Alignment for Contrastive Image Retrieval from Linguistically Complex Descriptions [ACL 2024]
☆11May 17, 2024Updated 2 years ago
SCNU203 / GeoQA-Plus
View on GitHub
☆20May 14, 2024Updated 2 years ago
FreedomIntelligence / FastLLM
View on GitHub
Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];
☆41Jan 4, 2024Updated 2 years ago
appy1608 / EMNLP2023-Multimodal-Complaint-Detection
View on GitHub
Federated Meta-Learning for Emotion and Sentiment Aware Multi-modal Complaint Identification
☆10May 30, 2024Updated 2 years ago
leeguandong / XrayLLaVA
View on GitHub
基于LLaVA1.6微调的Xray识别的多模态大模型
☆10Oct 22, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
google-research-datasets / maxm
View on GitHub
MaXM is a suite of test-only benchmarks for multilingual visual question answering in 7 languages: English (en), French (fr), Hindi (hi),…
☆13Jan 16, 2024Updated 2 years ago
mil-tokyo / Megatron-VLM
View on GitHub
☆26Feb 2, 2025Updated last year
Sun-Haoyuan23 / Awesome-RL-based-Reasoning-MLLMs
View on GitHub
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-bas…
☆1,434May 11, 2026Updated 2 months ago
InfiMM / Awesome-Multimodal-LLM-for-Math-STEM
View on GitHub
Paper collections of multi-modal LLM for Math/STEM/Code.
☆144May 17, 2026Updated 2 months ago
hint-lab / doctrack
View on GitHub
Dataset for EMNLP'23 Paper "DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading"
☆11Oct 25, 2023Updated 2 years ago
kai422 / SCALE
View on GitHub
[ICLR 2024] Scaling for Training Time and Post-hoc Out-of-distribution Detection Enhancement.
☆15Mar 12, 2024Updated 2 years ago
WangRongsheng / LLM101
View on GitHub
This repo offers advanced tutorials for LLMs, BERT-based models, and multimodal models, covering fine-tuning, quantization, vocabulary ex…
☆24May 5, 2025Updated last year
HUANGLIZI / MMFundus
View on GitHub
This repository is the official data collection of MMFundus (Multimodal Fundus) dataset.
☆13Feb 2, 2026Updated 5 months ago
bensantos / webcam-CycleGAN
View on GitHub
Image-to-Image Translation in PyTorch
☆13Mar 2, 2021Updated 5 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
isLinXu / DatasetMarkerTool
View on GitHub
🔨🔨🔨Tool for making model training data set
☆20Nov 1, 2024Updated last year
hieplpvip / medficientsam
View on GitHub
Efficient Segment Anything in Medical Images
☆45Jul 27, 2024Updated last year
taovv / UGPCL
View on GitHub
[IJCAI' 22] Uncertainty-Guided Pixel Contrastive Learning for Semi-Supervised Medical Image Segmentation.
☆26Aug 4, 2022Updated 3 years ago
yuezih / less-is-more
View on GitHub
Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)
☆58Oct 28, 2024Updated last year
jchen42703 / kits19-cnn
View on GitHub
Using convolutional neural networks for the 2019 Kidney and Kidney Tumor Segmentation Challenge
☆19Dec 13, 2019Updated 6 years ago
yczhou001 / Awesome-Diffusion-LLM
View on GitHub
paper list, tutorial, and nano code snippet for Diffusion Large Language Models.
☆170Jan 19, 2026Updated 6 months ago
yeezhu / UNIT
View on GitHub
PyTorch implementation of "UNIT: Unifying Image and Text Recognition in One Vision Encoder", NeurlPS 2024.
☆34Sep 26, 2024Updated last year
uni-medical / GMAI-MMBench
View on GitHub
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI.
☆86Dec 17, 2024Updated last year
keven980716 / weak-to-strong-deception
View on GitHub
[ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"
☆15Jun 21, 2024Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
Lackel / Hierarchical_Weighted_SCL
View on GitHub
[EMNLP 2022] Fine-grained Category Discovery under Coarse-grained supervision with Hierarchical Weighted Self-contrastive Learning
☆14Jun 22, 2024Updated 2 years ago
threegold116 / Awesome-Omni-MLLMs
View on GitHub
This is for ACL 2025 Findings Paper: From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalitiesModels
☆102Mar 22, 2026Updated 3 months ago
ML-GSAI / Diffusion-LLM-Papers
View on GitHub
A Collection of Papers on Diffusion Language Models
☆180Sep 15, 2025Updated 10 months ago
mrwu-mac / R-Bench
View on GitHub
[ICML2024] Repo for the paper `Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models'
☆24Jan 1, 2025Updated last year
BUAADreamer / llmkiller
View on GitHub
LLM手撕代码合集
☆23Mar 25, 2025Updated last year
williamium3000 / awesome-mllm-grounding
View on GitHub
Awesome paper for multi-modal llm with grounding ability
☆21Oct 11, 2025Updated 9 months ago
uakarsh / TiLT-Implementation
View on GitHub
Implementation of the paper: Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer.
☆18Apr 23, 2023Updated 3 years ago
OpenDCAI / Awesome_MLLMs_Reasoning
View on GitHub
☆112Sep 11, 2025Updated 10 months ago
muxin-wei / Rep-MedSAM
View on GitHub
Top 3 solution for CVPR24 SEGMENT ANYTHING IN MEDICAL IMAGES ON LAPTOP Challenge
☆11Apr 8, 2025Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
lezhang7 / MOQAGPT
View on GitHub
[EMNLP'2023 Findings] MoqaGPT, for zero-shot multimodal question answering with LLMs
☆13Dec 28, 2024Updated last year
nixiesearch / negminer
View on GitHub
A hard negative mining tool for embedding model training
☆15Sep 27, 2024Updated last year
ChenhongyiYang / SG-NMS
View on GitHub
[ECCV 2020] Learning to Separate: Detecting Heavily-Occluded Objects in Urban Scenes
☆12Dec 11, 2020Updated 5 years ago
FreedomIntelligence / Med-MAT
View on GitHub
[ACL 2025] Exploring Compositional Generalization of Multimodal LLMs for Medical Imaging
☆40Jun 4, 2025Updated last year
triplemeng / InferSent
View on GitHub
☆13Aug 28, 2018Updated 7 years ago
wyczzy / StealthDiffusion
View on GitHub
This repository is the official implementation of StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model
☆21Jul 30, 2024Updated last year
Ayanzadeh93 / Dog-Breeds-Identification
View on GitHub
In this playground competition, you are provided a strictly canine subset of ImageNet in order to practice fine-grained image categorizat…
☆11Dec 10, 2020Updated 5 years ago