sanbuphy / llm-vision-datasetsLinks

Collection of image and video datasets for generative AI and multimodal visual AI

☆32

Alternatives and similar repositories for llm-vision-datasets

Users that are interested in llm-vision-datasets are comparing it to the libraries listed below

Sorting:

hhaAndroid / awesome-mm-chat
多模态 MM +Chat 合集
☆279Updated 3 months ago
Victorwz / Open-Qwen2VL
[COLM 2025] Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources
☆287Updated 3 months ago
swordlidev / Efficient-Multimodal-LLMs-Survey
Efficient Multimodal Large Language Models: A Survey
☆375Updated 7 months ago
bobo0810 / LearnDeepSpeed
DeepSpeed教程 & 示例注释 & 学习笔记（大模型高效训练）
☆183Updated 2 years ago
jefferyZhan / Griffon
Official repo of Griffon series including v1(ECCV 2024), v2(ICCV 2025), G, and R, and also the RL tool Vision-R1.
☆245Updated 3 months ago
shufangxun / LLaVA-MoD
[ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
☆210Updated 8 months ago
alipay / Ant-Multi-Modal-Framework
Research Code for Multimodal-Cognition Team in Ant Group
☆169Updated last month
OpenGVLab / OmniCorpus
[ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
☆406Updated 6 months ago
RLHF-V / RLAIF-V
[CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness
☆424Updated 6 months ago
FanqingM / MM-Eureka-V0
MM-Eureka V0 also called R1-Multimodal-Journey, Latest version is in MM-Eureka
☆321Updated 5 months ago
open-compass / MMBench
Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"
☆269Updated 6 months ago
RLHF-V / RLHF-V
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
☆298Updated last year
kyegomez / NaViT
My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"
☆266Updated last month
pkunlp-icler / FastV
[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Langua…
☆519Updated 11 months ago
wdndev / mllm_interview_note
主要记录大语言大模型（LLMs）算法（应用）工程师多模态相关知识
☆252Updated last year
Meituan-AutoML / VisionLLaMA
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
☆390Updated last year
OpenGVLab / LAMM
[NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agents
☆318Updated last year
360CVGroup / FG-CLIP
New generation of CLIP with fine grained discrimination capability, ICML2025
☆485Updated last month
Coobiw / MPP-LLaVA
Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conv…
☆521Updated 8 months ago
TinyLLaVA / TinyLLaVA_Factory
A Framework of Small-scale Large Multimodal Models
☆924Updated 7 months ago
thunlp / LLaVA-UHD
LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMs
☆397Updated last week
x-cls / superclass
[NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training
☆218Updated 8 months ago
FreedomIntelligence / ALLaVA
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
☆277Updated last year
opendatalab / VIGC
AAAI 2024: Visual Instruction Generation and Correction
☆93Updated last year
NExT-ChatV / NExT-Chat
The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".
☆255Updated last year
ksOAn6g5 / TaiSu
TaiSu（太素）--a large-scale Chinese multimodal dataset（亿级大规模中文视觉语言预训练数据集）
☆190Updated 2 years ago
zai-org / CogCoM
☆215Updated last year
OpenRLHF / OpenRLHF-M
An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.
☆149Updated last month
zhangfaen / finetune-Qwen2-VL
☆381Updated 9 months ago
linkangheng / PR1
[NeurIPS 2025] Official code implementation of Perception R1: Pioneering Perception Policy with Reinforcement Learning
☆270Updated 4 months ago