inclusionAI / MingLinks
Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.
☆328Updated last week
Alternatives and similar repositories for Ming
Users that are interested in Ming are comparing it to the libraries listed below
Sorting:
- ☆219Updated 3 weeks ago
- A Unified Tokenizer for Visual Generation and Understanding☆340Updated last month
- VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning☆251Updated 2 months ago
- Official implementation of UnifiedReward & UnifiedReward-Think☆429Updated last week
- ☆310Updated last week
- Pytorch implementation for the paper titled "SimpleAR: Pushing the Frontier of Autoregressive Visual Generation"☆371Updated this week
- 💡 VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning☆217Updated last month
- [ICLR 2025] Autoregressive Video Generation without Vector Quantization☆532Updated last month
- Long Context Transfer from Language to Vision☆382Updated 3 months ago
- The official repo for "Vidi: Large Multimodal Models for Video Understanding and Editing"☆112Updated this week
- ☆76Updated 3 months ago
- Multimodal Models in Real World☆513Updated 4 months ago
- VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation☆267Updated 2 months ago
- [NeurIPS 2024] VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models☆155Updated 8 months ago
- UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation☆566Updated this week
- [ICLR 2025] OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation☆304Updated 3 weeks ago
- ☆105Updated last week
- [ICLR 2025] IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation☆190Updated 4 months ago
- ☆152Updated 5 months ago
- [ICLR 2025] VideoGrain: This repo is the official implementation of "VideoGrain: Modulating Space-Time Attention for Multi-Grained Video …☆136Updated 3 months ago
- [CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆337Updated 3 months ago
- MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)☆136Updated 5 months ago
- [ICML 2025] Official PyTorch implementation of LongVU☆383Updated last month
- Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with g…☆409Updated 2 months ago
- Official repository of T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT☆334Updated last week
- 🔥🔥First-ever hour scale video understanding models☆437Updated 3 weeks ago
- Official repo for "GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation"☆160Updated 2 months ago
- An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL☆783Updated last week
- Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.☆242Updated 3 months ago
- ☆401Updated 2 weeks ago