bytedance / vidiLinks
The official repo for "Vidi: Large Multimodal Models for Video Understanding and Editing"
☆578Updated 2 weeks ago
Alternatives and similar repositories for vidi
Users that are interested in vidi are comparing it to the libraries listed below
Sorting:
- LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)☆418Updated 3 months ago
- An unified model that seamlessly integrates multimodal understanding, text-to-image generation, and image editing within a single powerfu…☆449Updated 2 months ago
- HunyuanImage-2.1: An Efficient Diffusion Model for High-Resolution (2K) Text-to-Image Generation☆672Updated 3 months ago
- [AAAI-2026]FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation☆456Updated 11 months ago
- ☆291Updated 6 months ago
- MovieAgent: Automated Movie Generation via Multi-Agent CoT Planning☆288Updated 10 months ago
- ☆370Updated 10 months ago
- [SIGGRAPH2025] Official repo for paper "Any-length Video Inpainting and Editing with Plug-and-Play Context Control"☆565Updated 10 months ago
- Official Code Repo for UniVA: Universal Video Agents☆343Updated 2 weeks ago
- GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image Generation.☆734Updated last week
- VideoGen-Eval: Agent-based System for Video Generation Evaluation☆255Updated last month
- Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.☆580Updated 3 months ago
- 🧠 VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning (ICLR 2026)☆305Updated 2 weeks ago
- UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation☆838Updated last month
- MOVA: Towards Scalable and Synchronized Video–Audio Generation☆492Updated this week
- [🚀 ICLR 2026 Oral]NextStep-1: SOTA Autogressive Image Generation with Continuous Tokens. A research project developed by the StepFun’s M…☆602Updated last month
- Industry-level video foundation model for unified Text-to-Video (T2V) and Image-to-Video (I2V) generation.☆877Updated 5 months ago
- ☆612Updated last week
- ☆367Updated this week
- Native Multimodal Models are World Learners☆1,456Updated last month
- [NeurIPS 2025 Oral]Infinity⭐️: Unified Spacetime AutoRegressive Modeling for Visual Generation☆712Updated 2 months ago
- Official repository for "VideoPrism: A Foundational Visual Encoder for Video Understanding" (ICML 2024)☆348Updated 3 weeks ago
- ☆572Updated last year
- [ICCV 2025] AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction☆346Updated 10 months ago
- ☆251Updated last month
- Pusa: Thousands Timesteps Video Diffusion Model☆672Updated this week
- 🔥ICLR 2025 (Spotlight) One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt☆312Updated 3 months ago
- [ICCV 2025] Video-T1: Test-Time Scaling for Video Generation☆305Updated 7 months ago
- [CVPR 2025] This is an official inference code of the paper "BizGen: Advancing Article-level Visual Text Rendering for Infographics Gener…☆299Updated 10 months ago
- Official inference code and LongText-Bench benchmark for our paper X-Omni (https://arxiv.org/pdf/2507.22058).☆420Updated 5 months ago