This project aims to collect and collate various datasets for multimodal large model training, including but not limited to pre-training data, instruction fine-tuning data, and In-Context learning data.
☆74May 7, 2025Updated 11 months ago
Alternatives and similar repositories for Awesome-MLLM-Datasets
Users that are interested in Awesome-MLLM-Datasets are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆19May 14, 2024Updated last year
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆40Jan 4, 2024Updated 2 years ago
- 基于LLaVA1.6微调的Xray识别的多模态大模型☆10Oct 22, 2024Updated last year
- Code for paper: Unified Text-to-Image Generation and Retrieval☆16Jul 6, 2024Updated last year
- Multi-Task instruction-tuned LLaMA☆14May 5, 2023Updated 2 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A Collection of Papers on Diffusion Language Models☆164Sep 15, 2025Updated 6 months ago
- MaXM is a suite of test-only benchmarks for multilingual visual question answering in 7 languages: English (en), French (fr), Hindi (hi),…☆13Jan 16, 2024Updated 2 years ago
- Paper collections of multi-modal LLM for Math/STEM/Code.☆139Nov 17, 2025Updated 4 months ago
- The official implementation of COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence.☆28Dec 30, 2025Updated 3 months ago
- Dataset for EMNLP'23 Paper "DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading"☆11Oct 25, 2023Updated 2 years ago
- [ACL 2025] Towards Text-Image Interleaved Retrieval☆16Sep 3, 2025Updated 7 months ago
- This repo offers advanced tutorials for LLMs, BERT-based models, and multimodal models, covering fine-tuning, quantization, vocabulary ex…☆24May 5, 2025Updated 11 months ago
- Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Mod…☆365Mar 19, 2025Updated last year
- This repository is the official data collection of MMFundus (Multimodal Fundus) dataset.☆13Feb 2, 2026Updated 2 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- paper list, tutorial, and nano code snippet for Diffusion Large Language Models.☆162Jan 19, 2026Updated 2 months ago
- [CVPR'25] 🌟🌟 EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering☆46Jun 19, 2025Updated 9 months ago
- Efficient Segment Anything in Medical Images☆42Jul 27, 2024Updated last year
- This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-bas…☆1,392Feb 26, 2026Updated last month
- Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)☆57Oct 28, 2024Updated last year
- Using convolutional neural networks for the 2019 Kidney and Kidney Tumor Segmentation Challenge☆19Dec 13, 2019Updated 6 years ago
- GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI.☆84Dec 17, 2024Updated last year
- iSegFormer: Interactive Image/Volume Segmentation using Vision Transformers (MICCAI 2022)☆31Oct 24, 2025Updated 5 months ago
- [ICML2024] Repo for the paper `Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models'☆24Jan 1, 2025Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- [ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"☆14Jun 21, 2024Updated last year
- a py3 lib for NLP & image-caption metrics : BLEU METEOR CIDEr ROUGE SPICE WMD☆14Sep 13, 2022Updated 3 years ago
- Awesome paper for multi-modal llm with grounding ability☆19Oct 11, 2025Updated 5 months ago
- ☆111Sep 11, 2025Updated 6 months ago
- Implementation of the paper: Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer.☆18Apr 23, 2023Updated 2 years ago
- Image-to-Image Translation in PyTorch☆13Mar 2, 2021Updated 5 years ago
- This is for ACL 2025 Findings Paper: From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalitiesModels☆95Mar 22, 2026Updated 2 weeks ago
- ☆13Aug 28, 2018Updated 7 years ago
- Official implementation of ICML 2025 paper "Understanding Multimodal LLMs Under Distribution Shifts: An Information-Theoretic Approach"☆12May 27, 2025Updated 10 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Research and Implementation of Finger Vein Recognition Algorithm☆13May 10, 2021Updated 4 years ago
- TensorFlow version of SqueezeNet with converted pretrained weights☆27Mar 11, 2017Updated 9 years ago
- [ECCV 2020] Learning to Separate: Detecting Heavily-Occluded Objects in Urban Scenes☆12Dec 11, 2020Updated 5 years ago
- [EMNLP'2023 Findings] MoqaGPT, for zero-shot multimodal question answering with LLMs☆13Dec 28, 2024Updated last year
- Fill those pesky holes in your depth map with ofxKinectInpainter!☆25Feb 16, 2012Updated 14 years ago
- Source code for WWW 2019 paper "Efficient Path Prediction for Semi-Supervised and Weakly Supervised Hierarchical Text Classification"☆14May 3, 2019Updated 6 years ago
- 3D Telecommunications project utilizing Holoportation technology to provide live volumetric capture. Used in one case to increase the re…☆21Feb 20, 2026Updated last month