bytedance / vidiLinks
The official repo for "Vidi: Large Multimodal Models for Video Understanding and Editing"
☆541Updated 2 weeks ago
Alternatives and similar repositories for vidi
Users that are interested in vidi are comparing it to the libraries listed below
Sorting:
- LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)☆356Updated last month
- An unified model that seamlessly integrates multimodal understanding, text-to-image generation, and image editing within a single powerfu…☆445Updated 3 weeks ago
- HunyuanImage-2.1: An Efficient Diffusion Model for High-Resolution (2K) Text-to-Image Generation☆663Updated 2 months ago
- [SIGGRAPH2025] Official repo for paper "Any-length Video Inpainting and Editing with Plug-and-Play Context Control"☆534Updated 8 months ago
- ☆285Updated 4 months ago
- ☆365Updated 9 months ago
- ☆571Updated last year
- [AAAI-2026]FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation☆454Updated 9 months ago
- Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.☆561Updated last month
- ☆330Updated this week
- Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"☆130Updated last year
- A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.☆163Updated 10 months ago
- UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation☆822Updated last month
- [CVPR 2025] This is an official inference code of the paper "BizGen: Advancing Article-level Visual Text Rendering for Infographics Gener…☆295Updated 8 months ago
- Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with g…☆507Updated 4 months ago
- [ICCV 2025] AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction☆341Updated 8 months ago
- 💡 VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning☆289Updated 2 months ago
- MovieAgent: Automated Movie Generation via Multi-Agent CoT Planning☆274Updated 9 months ago
- ☆579Updated last month
- [CVPR 2025] Official repo for ART:Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation☆358Updated 4 months ago
- HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning☆1,039Updated this week
- Official repository for "VideoPrism: A Foundational Visual Encoder for Video Understanding" (ICML 2024)☆334Updated 2 months ago
- [ICML 2025] Official PyTorch implementation of LongVU☆412Updated 7 months ago
- VideoGen-Eval: Agent-based System for Video Generation Evaluation☆254Updated last week
- [ICCV 2025] Video-T1: Test-Time Scaling for Video Generation☆303Updated 5 months ago
- Official inference code and LongText-Bench benchmark for our paper X-Omni (https://arxiv.org/pdf/2507.22058).☆403Updated 4 months ago
- [NeurIPS 2025 Oral]Infinity⭐️: Unified Spacetime AutoRegressive Modeling for Visual Generation☆670Updated last month
- Native Multimodal Models are World Learners☆1,374Updated last month
- [ICCV 2025] Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning☆208Updated last month
- ☆519Updated this week