bytedance/vidi

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/bytedance/vidi)

bytedance / vidi

The official repo for "Vidi: Large Multimodal Models for Video Understanding and Editing"

☆646

Alternatives and similar repositories for vidi

Users that are interested in vidi are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

marinero4972 / Open-o3-Video
View on GitHub
[ICML 2026] Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"
☆157May 1, 2026Updated 2 months ago
Yaofang-Liu / Pusa-VidGen
View on GitHub
Pusa: Thousands Timesteps Video Diffusion Model
☆685Feb 13, 2026Updated 5 months ago
OpenGVLab / VideoChat-R1
View on GitHub
[NIPS2025] VideoChat-R1 & R1.5: Enhancing Spatio-Temporal Perception and Reasoning via Reinforcement Fine-Tuning
☆268Oct 18, 2025Updated 9 months ago
ali-vilab / VACE
View on GitHub
[ICCV 2025] Official implementations for paper: VACE: All-in-One Video Creation and Editing
☆3,870Oct 17, 2025Updated 9 months ago
zhang9302002 / ThinkingWithVideos
View on GitHub
The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"
☆101Oct 15, 2025Updated 9 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Kwai-Keye / Keye
View on GitHub
☆805Jun 10, 2026Updated last month
VectorSpaceLab / Video-XL
View on GitHub
🔥🔥First-ever hour scale video understanding models
☆626Jul 14, 2025Updated last year
KlingAIResearch / UniVideo
View on GitHub
[ICLR 2026] UniVideo: Unified Understanding, Generation, and Editing for Videos
☆539Jul 3, 2026Updated 2 weeks ago
PKU-YuanGroup / UniWorld
View on GitHub
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
☆883Dec 23, 2025Updated 6 months ago
TencentARC / ARC-Hunyuan-Video-7B
View on GitHub
Structured Video Comprehension of Real-World Shorts
☆238Sep 21, 2025Updated 10 months ago
bytedance / Video-As-Prompt
View on GitHub
[ICLR 2026] Official repo for paper "Video-As-Prompt: Unified Semantic Control for Video Generation"
☆439Feb 8, 2026Updated 5 months ago
OpenVE-Team / OpenVE-3M
View on GitHub
OpenVE-3M: A Large-Scale High-Quality Dataset for Instruction-Guided Video Editing
☆51Apr 15, 2026Updated 3 months ago
baaivision / Emu3.5
View on GitHub
Native Multimodal Models are World Learners
☆1,536Dec 30, 2025Updated 6 months ago
thu-ml / TurboDiffusion
View on GitHub
TurboDiffusion: 100–200× Acceleration for Video Diffusion Models
☆3,577Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ByteDance-Seed / Bagel
View on GitHub
Open-source unified multimodal model
☆6,103May 4, 2026Updated 2 months ago
Kevin-thu / StoryMem
View on GitHub
Official code for StoryMem: Multi-shot Long Video Storytelling with Memory
☆761May 25, 2026Updated last month
TencentARC / TimeLens
View on GitHub
[CVPR 2026] TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs
☆158Apr 27, 2026Updated 2 months ago
EzioBy / Ditto
View on GitHub
[CVPR'26 Highlight] Ditto: Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset
☆617Jun 1, 2026Updated last month
Vchitect / LongVie
View on GitHub
☆333Jan 24, 2026Updated 5 months ago
nv-tlabs / ChronoEdit
View on GitHub
[ICLR 2026] ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation
☆697Nov 20, 2025Updated 8 months ago
Phantom-video / HuMo
View on GitHub
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning
☆1,273Jan 25, 2026Updated 5 months ago
zai-org / RealVideo
View on GitHub
A real-time streaming conversational video system that transforms text interactions into continuous, high-fidelity video responses using …
☆334Dec 15, 2025Updated 7 months ago
bytedance / lynx
View on GitHub
Lynx: Towards High-Fidelity Personalized Video Generation
☆336Feb 27, 2026Updated 4 months ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
bytedance / video-SALMONN-2
View on GitHub
video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions, which is d…
☆204Feb 23, 2026Updated 4 months ago
FoundationVision / Waver
View on GitHub
Industry-level video foundation model for unified Text-to-Video (T2V) and Image-to-Video (I2V) generation.
☆948Aug 27, 2025Updated 10 months ago
tianweiy / CausVid
View on GitHub
(CVPR 2025) From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
☆1,397Aug 7, 2025Updated 11 months ago
bytedance / ContentV
View on GitHub
☆130Jun 24, 2025Updated last year
hao-ai-lab / FastVideo
View on GitHub
A unified inference and post-training framework for accelerated video generation.
☆3,862Updated this week
justincui03 / Self-Forcing-Plus-Plus
View on GitHub
Official Repo for Self-Forcing++ High Quality Long Video Generation
☆264Oct 13, 2025Updated 9 months ago
FunAudioLLM / FunCineForge
View on GitHub
☆442Mar 25, 2026Updated 3 months ago
stepfun-ai / NextStep-1
View on GitHub
[🚀 ICLR 2026 Oral] NextStep-1: SOTA Autogressive Image Generation with Continuous Tokens. A research project developed by the StepFun’s …
☆689Feb 27, 2026Updated 4 months ago
alex4727 / MotionStream
View on GitHub
MotionStream: Real-Time Video Generation with Interactive Motion Controls
☆571Mar 1, 2026Updated 4 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
OpenGVLab / VideoChat-Flash
View on GitHub
[ICLR2026] VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
☆526Updated this week
SandAI-org / MAGI-1
View on GitHub
MAGI-1: Autoregressive Video Generation at Scale
☆3,741Jun 17, 2026Updated last month
stepfun-ai / Step1X-Edit
View on GitHub
A SOTA open-source image editing model, which aims to provide comparable performance against the closed-source models like GPT-4o and Gem…
☆2,236Apr 29, 2026Updated 2 months ago
FoundationVision / InfinityStar
View on GitHub
[NeurIPS 2025 Oral]Infinity⭐️: Uniﬁed Spacetime AutoRegressive Modeling for Visual Generation
☆772Apr 16, 2026Updated 3 months ago
yeliudev / VideoMind
View on GitHub
🧠 VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning (ICLR 2026)
☆346Feb 8, 2026Updated 5 months ago
QwenLM / Qwen-Image
View on GitHub
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
☆8,132Feb 10, 2026Updated 5 months ago
aigc-apps / VideoX-Fun
View on GitHub
📹 A more flexible framework that can generate videos at any resolution and creates videos from images.
☆2,175Updated this week