bytedance / vidiLinks
The official repo for "Vidi: Large Multimodal Models for Video Understanding and Editing"
☆578Updated 2 weeks ago
Alternatives and similar repositories for vidi
Users that are interested in vidi are comparing it to the libraries listed below
Sorting:
- LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)☆418Updated 3 months ago
- ☆370Updated 10 months ago
- HunyuanImage-2.1: An Efficient Diffusion Model for High-Resolution (2K) Text-to-Image Generation☆672Updated 3 months ago
- [AAAI-2026]FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation☆456Updated 11 months ago
- An unified model that seamlessly integrates multimodal understanding, text-to-image generation, and image editing within a single powerfu…☆449Updated 2 months ago
- ☆291Updated 6 months ago
- ☆367Updated this week
- [SIGGRAPH2025] Official repo for paper "Any-length Video Inpainting and Editing with Plug-and-Play Context Control"☆565Updated 10 months ago
- Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.☆580Updated 3 months ago
- GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image Generation.☆734Updated last week
- [NeurIPS 2025 Oral]Infinity⭐️: Unified Spacetime AutoRegressive Modeling for Visual Generation☆712Updated 2 months ago
- UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation☆838Updated last month
- MovieAgent: Automated Movie Generation via Multi-Agent CoT Planning☆288Updated 10 months ago
- ☆251Updated last month
- Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"☆131Updated last year
- A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.☆167Updated last year
- Official Code Repo for UniVA: Universal Video Agents☆343Updated 2 weeks ago
- [🚀 ICLR 2026 Oral]NextStep-1: SOTA Autogressive Image Generation with Continuous Tokens. A research project developed by the StepFun’s M…☆602Updated last month
- ☆612Updated last week
- Native Multimodal Models are World Learners☆1,448Updated last month
- VideoGen-Eval: Agent-based System for Video Generation Evaluation☆255Updated last month
- MOVA: Towards Scalable and Synchronized Video–Audio Generation☆492Updated this week
- Pusa: Thousands Timesteps Video Diffusion Model☆672Updated this week
- Industry-level video foundation model for unified Text-to-Video (T2V) and Image-to-Video (I2V) generation.☆877Updated 5 months ago
- HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning☆1,133Updated 2 weeks ago
- [ICLR 2025] Official Implementation of Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image…☆341Updated last week
- 🔥ICLR 2025 (Spotlight) One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt☆312Updated 3 months ago
- We present FlashPortrait, an end-to-end video diffusion transformer capable of synthesizing ID-preserving, infinite-length videos while a…☆434Updated last month
- Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with g…☆516Updated 5 months ago
- [ICCV 2025] AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction☆346Updated 10 months ago