bytedance / vidiLinks
The official repo for "Vidi: Large Multimodal Models for Video Understanding and Editing"
☆556Updated last month
Alternatives and similar repositories for vidi
Users that are interested in vidi are comparing it to the libraries listed below
Sorting:
- LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)☆363Updated 2 months ago
- ☆368Updated 9 months ago
- HunyuanImage-2.1: An Efficient Diffusion Model for High-Resolution (2K) Text-to-Image Generation☆669Updated 3 months ago
- An unified model that seamlessly integrates multimodal understanding, text-to-image generation, and image editing within a single powerfu…☆445Updated last month
- ☆288Updated 5 months ago
- [AAAI-2026]FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation☆454Updated 10 months ago
- VideoGen-Eval: Agent-based System for Video Generation Evaluation☆253Updated last month
- MovieAgent: Automated Movie Generation via Multi-Agent CoT Planning☆277Updated 9 months ago
- Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.☆569Updated 2 months ago
- UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation☆835Updated 3 weeks ago
- 🔥ICLR 2025 (Spotlight) One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt☆309Updated 2 months ago
- [SIGGRAPH2025] Official repo for paper "Any-length Video Inpainting and Editing with Plug-and-Play Context Control"☆545Updated 9 months ago
- GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image Generation.☆524Updated this week
- NextStep-1: SOTA Autogressive Image Generation with Continuous Tokens. A research project developed by the StepFun’s Multimodal Intellige…☆594Updated 3 weeks ago
- ☆572Updated last year
- [ICCV 2025] AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction☆342Updated 9 months ago
- ☆572Updated this week
- Native Multimodal Models are World Learners☆1,399Updated 2 weeks ago
- [CVPR 2025] This is an official inference code of the paper "BizGen: Advancing Article-level Visual Text Rendering for Infographics Gener…☆299Updated 9 months ago
- Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"☆130Updated last year
- Pusa: Thousands Timesteps Video Diffusion Model☆671Updated 4 months ago
- [NeurIPS 2025 Oral]Infinity⭐️: Unified Spacetime AutoRegressive Modeling for Visual Generation☆689Updated last month
- [CVPR 2025] Official repo for ART:Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation☆362Updated 5 months ago
- Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with g…☆510Updated 5 months ago
- Multimodal Models in Real World☆552Updated 10 months ago
- ☆348Updated last week
- [ICML 2025] Official PyTorch implementation of LongVU☆417Updated 8 months ago
- Official inference code and LongText-Bench benchmark for our paper X-Omni (https://arxiv.org/pdf/2507.22058).☆410Updated 4 months ago
- A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.☆165Updated 11 months ago
- The official code implementation of the paper "OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data."☆425Updated 7 months ago