inclusionAI / MingLinks
Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.
☆454Updated last week
Alternatives and similar repositories for Ming
Users that are interested in Ming are comparing it to the libraries listed below
Sorting:
- ☆270Updated last month
- ☆530Updated last week
- Official implementation of UnifiedReward & UnifiedReward-Think☆531Updated last week
- Multimodal Models in Real World☆540Updated 6 months ago
- ☆171Updated 7 months ago
- UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation☆696Updated last month
- An unified model that seamlessly integrates multimodal understanding, text-to-image generation, and image editing within a single powerfu…☆412Updated last month
- MiMo-VL☆538Updated 3 weeks ago
- A Unified Tokenizer for Visual Generation and Understanding☆396Updated last month
- VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning☆263Updated 5 months ago
- LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)☆264Updated this week
- ☆615Updated last week
- The official repo for "Vidi: Large Multimodal Models for Video Understanding and Editing"☆134Updated 2 weeks ago
- 🔥🔥First-ever hour scale video understanding models☆541Updated 2 months ago
- Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.☆249Updated last month
- [ICLR 2025] Official Implementation of Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image…☆327Updated 3 weeks ago
- The official repository of the dots.vlm1 instruct models proposed by rednote-hilab.☆244Updated 2 weeks ago
- Official PyTorch implementation of EMOVA in CVPR 2025 (https://arxiv.org/abs/2409.18042)☆67Updated 5 months ago
- [ICLR 2025] Autoregressive Video Generation without Vector Quantization☆567Updated last week
- [ICML 2025] Official PyTorch implementation of LongVU☆396Updated 4 months ago
- 💡 VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning☆250Updated 2 weeks ago
- Official inference code and LongText-Bench benchmark for our paper X-Omni (https://arxiv.org/pdf/2507.22058).☆371Updated 2 weeks ago
- Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation☆59Updated 2 months ago
- ☆123Updated last month
- [ICLR 2025] VideoGrain: This repo is the official implementation of "VideoGrain: Modulating Space-Time Attention for Multi-Grained Video …☆150Updated 5 months ago
- HunyuanImage-2.1: An Efficient Diffusion Model for High-Resolution (2K) Text-to-Image Generation☆350Updated this week
- ☆78Updated 4 months ago
- Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraini…☆622Updated 5 months ago
- video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions, which is d…☆62Updated 3 weeks ago
- Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"☆477Updated last year