Mixture-of-Groups Attention for End-to-End Long Video Generation
☆92Oct 22, 2025Updated 4 months ago
Alternatives and similar repositories for MoGA
Users that are interested in MoGA are comparing it to the libraries listed below
Sorting:
- ☆18Mar 21, 2025Updated 11 months ago
- Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions☆23Feb 11, 2026Updated 3 weeks ago
- OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models☆154Updated this week
- ☆16May 13, 2025Updated 9 months ago
- This repository contains the code for the paper “Neuro-Symbolic Query Compiler”, accepted to the Findings of ACL 2025.☆16Oct 20, 2025Updated 4 months ago
- Cost-Sensitive Toolpath Agent for Multi-turn Image Editing☆26Mar 26, 2025Updated 11 months ago
- 中科大跨模态智能组-每周论文分享☆16Nov 20, 2022Updated 3 years ago
- Wan: Open and Advanced Large-Scale Video Generative Models☆28Jul 28, 2025Updated 7 months ago
- [ICLR 2026] Official repo for paper "Video-As-Prompt: Unified Semantic Control for Video Generation"☆390Feb 8, 2026Updated last month
- [ICLR 2026] Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing☆29Feb 6, 2026Updated last month
- [ICME 2025] DiffusionTalker: Efficient and Compact Speech-Driven 3D Talking Head via Personalizer-Guided Distillation☆24Mar 25, 2025Updated 11 months ago
- [ACM MM25] LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models☆23Mar 29, 2025Updated 11 months ago
- Unifying Specialized Visual Encoders for Video Language Models☆25Nov 22, 2025Updated 3 months ago
- TPDiff: Temporal Pyramid Video Diffusion Model☆25Mar 13, 2025Updated 11 months ago
- [ICLR 2026] Generative View Stitching☆106Nov 7, 2025Updated 4 months ago
- Implementation of our IJCAI2022 oral paper, ER-SAN: Enhanced-Adaptive Relation Self-Attention Network for Image Captioning.☆24Aug 5, 2023Updated 2 years ago
- [AAAI 2026] Multimodal Deepresearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework☆45Jan 25, 2026Updated last month
- ☆22Dec 11, 2024Updated last year
- [ICLR 2026] 🐻 Uniform Discrete Diffusion with Metric Path for Video Generation☆106Feb 6, 2026Updated last month
- E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models☆39Jan 5, 2026Updated 2 months ago
- Official Code Release of NeurIPS 2025 Paper: HoloScene: Simulation‑Ready Interactive 3D Worlds from a Single Video☆90Oct 8, 2025Updated 5 months ago
- [Unofficial Implementation] Subject-driven Video Generation via Disentangled Identity and Motion☆58Jan 5, 2026Updated 2 months ago
- Video Diffusion Transformers are In-Context Learners☆35Jan 6, 2025Updated last year
- Extend the Conditioning of Stable Diffusion to take Audio Embeddings Instead of Text Embeddings using Wav2Vec2-BERT model☆13Sep 25, 2024Updated last year
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 8 months ago
- [NeurIPS 2025] ScaleKV: Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression☆50Nov 4, 2025Updated 4 months ago
- A Text2SQL benchmark for evaluation of Large Language Models☆41Updated this week
- Code release for AccDiffusionV2 (TPAMI)☆35Nov 4, 2025Updated 4 months ago
- Official implementation of project NoiseCLR, published at CVPR 2024☆29Jun 15, 2024Updated last year
- [CVPR 2026] Official Implementations for Paper - HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives☆634Nov 26, 2025Updated 3 months ago
- The official implementation of "Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers" (arXiv …☆51Jun 6, 2025Updated 9 months ago
- ☆110Sep 3, 2025Updated 6 months ago
- [ICCV 2025 Findings Oral] DNF-Avatar: Distilling Neural Fields for Real-time Animatable Avatar Relighting☆40Nov 20, 2025Updated 3 months ago
- Scaling Zero-Shot Reference-to-Video Generation☆63Dec 11, 2025Updated 2 months ago
- ☆18Jun 10, 2025Updated 9 months ago
- Official Repository for paper "HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding"☆59Jan 23, 2026Updated last month
- [NeurIPS ENLSP Workshop'24] CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios☆16Oct 18, 2024Updated last year
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025☆33May 1, 2025Updated 10 months ago
- [ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegr…☆80Dec 10, 2024Updated last year