inclusionAI / MingLinks

Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.

☆421

Alternatives and similar repositories for Ming

Users that are interested in Ming are comparing it to the libraries listed below

Sorting:

modelscope / Nexus-Gen
☆259Updated last week
AILab-CVC / SEED-X
Multimodal Models in Real World
☆530Updated 5 months ago
VARGPT-family / VARGPT-v1.1
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
☆260Updated 3 months ago
PKU-YuanGroup / UniWorld-V1
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
☆671Updated this week
CodeGoat24 / UnifiedReward
Official implementation of UnifiedReward & UnifiedReward-Think
☆493Updated last week
FoundationVision / UniTok
A Unified Tokenizer for Visual Generation and Understanding
☆371Updated this week
AIDC-AI / Ovis-U1
An unified model that seamlessly integrates multimodal understanding, text-to-image generation, and image editing within a single powerfu…
☆387Updated this week
bytedance / vidi
The official repo for "Vidi: Large Multimodal Models for Video Understanding and Editing"
☆126Updated last month
viiika / Meissonic
[ICLR 2025] Official Implementation of Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image…
☆323Updated 3 weeks ago
bytedance / ContentV
☆124Updated last month
baaivision / NOVA
[ICLR 2025] Autoregressive Video Generation without Vector Quantization
☆553Updated 3 weeks ago
X-Omni-Team / X-Omni
Official inference code and LongText-Bench benchmark for our paper X-Omni (https://arxiv.org/pdf/2507.22058).
☆170Updated last week
klingfoley / Kling-Foley
Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation
☆55Updated last month
bytedance / Valley
Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.
☆245Updated 5 months ago
EvolvingLMMs-Lab / Aero-1
☆77Updated 3 months ago
baichuan-inc / Baichuan-Omni-1.5
☆166Updated 5 months ago
VectorSpaceLab / Video-XL
🔥🔥First-ever hour scale video understanding models
☆517Updated 3 weeks ago
JavisDiT / JavisDiT
Official implementation of "JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization"
☆79Updated last week
Vision-CAIR / LongVU
[ICML 2025] Official PyTorch implementation of LongVU
☆392Updated 2 months ago
CaraJ7 / T2I-R1
Official repository of T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
☆371Updated last week
bytedance / tarsier
Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with g…
☆439Updated 3 months ago
zai-org / VisionReward
VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation
☆292Updated 4 months ago
YangLing0818 / IterComp
[ICLR 2025] IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
☆193Updated 5 months ago
mira-space / MiraData
Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"
☆461Updated 11 months ago
XiaomiMiMo / MiMo-VL
MiMo-VL
☆474Updated 2 weeks ago
XueZeyue / DanceGRPO
An official implementation of DanceGRPO: Unleashing GRPO on Visual Generation
☆537Updated this week
Kwai-Keye / Keye
☆491Updated last week
modelscope / DiffSynth-Engine
☆180Updated this week
yeliudev / VideoMind
💡 VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
☆239Updated last month
mulanai / MuLan
MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)
☆137Updated 6 months ago