bytedance / vidiLinks
The official repo for "Vidi: Large Multimodal Models for Video Understanding and Editing"
☆393Updated this week
Alternatives and similar repositories for vidi
Users that are interested in vidi are comparing it to the libraries listed below
Sorting:
- LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)☆312Updated last month
- An unified model that seamlessly integrates multimodal understanding, text-to-image generation, and image editing within a single powerfu…☆437Updated this week
- [AAAI-2026]FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation☆453Updated 9 months ago
- ☆363Updated 8 months ago
- Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"☆130Updated last year
- HunyuanImage-2.1: An Efficient Diffusion Model for High-Resolution (2K) Text-to-Image Generation☆659Updated last month
- [SIGGRAPH2025] Official repo for paper "Any-length Video Inpainting and Editing with Plug-and-Play Context Control"☆525Updated 8 months ago
- ☆571Updated last year
- Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.☆549Updated last month
- [NeurIPS 2025 Oral]Infinity⭐️: Unified Spacetime AutoRegressive Modeling for Visual Generation☆626Updated last week
- ☆282Updated 4 months ago
- Native Multimodal Models are World Learners☆1,324Updated last week
- ☆313Updated this week
- MovieAgent: Automated Movie Generation via Multi-Agent CoT Planning☆263Updated 8 months ago
- [ICCV 2025] AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction☆341Updated 7 months ago
- A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.☆161Updated 10 months ago
- Echo-4o☆248Updated last month
- ☆573Updated 3 weeks ago
- UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation☆810Updated last week
- VideoGen-Eval: Agent-based System for Video Generation Evaluation☆251Updated 8 months ago
- [ICML 2025] Official PyTorch implementation of LongVU☆413Updated 6 months ago
- 🔥ICLR 2025 (Spotlight) One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt☆309Updated last month
- Industry-level video foundation model for unified Text-to-Video (T2V) and Image-to-Video (I2V) generation.☆762Updated 3 months ago
- Pusa: Thousands Timesteps Video Diffusion Model☆665Updated 3 months ago
- [CVPR 2025] Official repo for ART:Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation☆352Updated 4 months ago
- [CVPR 2025] This is an official inference code of the paper "BizGen: Advancing Article-level Visual Text Rendering for Infographics Gener…☆294Updated 8 months ago
- [ICLR'25] MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequences☆317Updated last year
- [ICCV 2025] Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning☆208Updated last month
- [ICLR 2025] VideoGrain: This repo is the official implementation of "VideoGrain: Modulating Space-Time Attention for Multi-Grained Video …☆157Updated 8 months ago
- 💡 VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning☆284Updated last month