inclusionAI / MingLinks
Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.
☆421Updated last week
Alternatives and similar repositories for Ming
Users that are interested in Ming are comparing it to the libraries listed below
Sorting:
- ☆259Updated last week
- Multimodal Models in Real World☆530Updated 5 months ago
- VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning☆260Updated 3 months ago
- UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation☆671Updated this week
- Official implementation of UnifiedReward & UnifiedReward-Think☆493Updated last week
- A Unified Tokenizer for Visual Generation and Understanding☆371Updated this week
- An unified model that seamlessly integrates multimodal understanding, text-to-image generation, and image editing within a single powerfu…☆387Updated this week
- The official repo for "Vidi: Large Multimodal Models for Video Understanding and Editing"☆126Updated last month
- [ICLR 2025] Official Implementation of Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image…☆323Updated 3 weeks ago
- ☆124Updated last month
- [ICLR 2025] Autoregressive Video Generation without Vector Quantization☆553Updated 3 weeks ago
- Official inference code and LongText-Bench benchmark for our paper X-Omni (https://arxiv.org/pdf/2507.22058).☆170Updated last week
- Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation☆55Updated last month
- Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.☆245Updated 5 months ago
- ☆77Updated 3 months ago
- ☆166Updated 5 months ago
- 🔥🔥First-ever hour scale video understanding models☆517Updated 3 weeks ago
- Official implementation of "JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization"☆79Updated last week
- [ICML 2025] Official PyTorch implementation of LongVU☆392Updated 2 months ago
- Official repository of T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT☆371Updated last week
- Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with g…☆439Updated 3 months ago
- VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation☆292Updated 4 months ago
- [ICLR 2025] IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation☆193Updated 5 months ago
- Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"☆461Updated 11 months ago
- MiMo-VL☆474Updated 2 weeks ago
- An official implementation of DanceGRPO: Unleashing GRPO on Visual Generation☆537Updated this week
- ☆491Updated last week
- ☆180Updated this week
- 💡 VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning☆239Updated last month
- MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)☆137Updated 6 months ago