opendatalab / laion5b-downloaderLinks

☆117

Alternatives and similar repositories for laion5b-downloader

Users that are interested in laion5b-downloader are comparing it to the libraries listed below

Sorting:

OpenGVLab / MM-Interleaved
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
☆247Updated last year
scenarios / WeMM
☆87Updated last year
kyegomez / NaViT
My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"
☆266Updated last month
opendatalab / VIGC
AAAI 2024: Visual Instruction Generation and Correction
☆93Updated last year
thu-ml / zh-clip
☆72Updated 2 years ago
mutonix / Vript
☆156Updated 10 months ago
mulanai / MuLan
MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)
☆143Updated 10 months ago
OpenGVLab / OmniCorpus
[ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
☆404Updated 6 months ago
ksOAn6g5 / TaiSu
TaiSu（太素）--a large-scale Chinese multimodal dataset（亿级大规模中文视觉语言预训练数据集）
☆190Updated 2 years ago
baaivision / CapsFusion
[CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale
☆212Updated last year
alipay / Ant-Multi-Modal-Framework
Research Code for Multimodal-Cognition Team in Ant Group
☆169Updated last month
Meituan-AutoML / VisionLLaMA
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
☆390Updated last year
x-cls / superclass
[NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training
☆218Updated 8 months ago
daooshee / HD-VG-130M
The HD-VG-130M Dataset
☆120Updated last year
KlingTeam / Uniaa
Unified Multi-modal IAA Baseline and Benchmark
☆90Updated last year
X-PLUG / Youku-mPLUG
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
☆299Updated last year
X2FD / LVIS-INSTRUCT4V
☆133Updated last year
Beckschen / ViTamin
[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"
☆210Updated last year
JourneyDB / JourneyDB
☆180Updated 2 weeks ago
Victorwz / Open-Qwen2VL
[COLM 2025] Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources
☆285Updated 3 months ago
CuriseJia / ECCV24-FreeStyleRet
Precision Search through Multi-Style Inputs
☆73Updated 4 months ago
JD-GenX / CAIG
[WWW 2025] Official PyTorch Code for "CTR-Driven Advertising Image Generation with Multimodal Large Language Models"
☆58Updated 3 months ago
ShareGPT4Omni / ShareGPT4V
[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions
☆243Updated last year
bytedance / Valley
Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.
☆257Updated 3 weeks ago
WangWenhao0716 / VidProM
[NeurIPS 2024] VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models
☆166Updated last year
showlab / VisorGPT
[NeurIPS 2023] Customize spatial layouts for conditional image synthesis models, e.g., ControlNet, using GPT
☆136Updated last year
zai-org / CogCoM
☆215Updated last year
friedrichor / UNITE
official code for "Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval"
☆38Updated 4 months ago
AILab-CVC / SEED-X
Multimodal Models in Real World
☆549Updated 9 months ago
jy0205 / LaVIT
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
☆598Updated last year