bytedance-fanqie-ai / MoGAView external linksLinks
Mixture-of-Groups Attention for End-to-End Long Video Generation
☆92Oct 22, 2025Updated 3 months ago
Alternatives and similar repositories for MoGA
Users that are interested in MoGA are comparing it to the libraries listed below
Sorting:
- ☆18Mar 21, 2025Updated 10 months ago
- OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models☆153Sep 24, 2025Updated 4 months ago
- ☆16May 13, 2025Updated 9 months ago
- This repository contains the code for the paper “Neuro-Symbolic Query Compiler”, accepted to the Findings of ACL 2025.☆16Oct 20, 2025Updated 3 months ago
- [ICLR 2026] Official repo for paper "Video-As-Prompt: Unified Semantic Control for Video Generation"☆377Feb 8, 2026Updated last week
- Cost-Sensitive Toolpath Agent for Multi-turn Image Editing☆26Mar 26, 2025Updated 10 months ago
- 中科大跨模态智能组-每周论文分享☆16Nov 20, 2022Updated 3 years ago
- Wan: Open and Advanced Large-Scale Video Generative Models☆28Jul 28, 2025Updated 6 months ago
- [ICLR 2026] Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing☆29Feb 6, 2026Updated last week
- [ACM MM25] LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models☆23Mar 29, 2025Updated 10 months ago
- [ICME 2025] DiffusionTalker: Efficient and Compact Speech-Driven 3D Talking Head via Personalizer-Guided Distillation☆24Mar 25, 2025Updated 10 months ago
- [ICLR 2026] Generative View Stitching☆100Nov 7, 2025Updated 3 months ago
- TPDiff: Temporal Pyramid Video Diffusion Model☆23Mar 13, 2025Updated 11 months ago
- [AAAI 2026] Multimodal Deepresearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework☆44Jan 25, 2026Updated 3 weeks ago
- Unifying Specialized Visual Encoders for Video Language Models☆25Nov 22, 2025Updated 2 months ago
- Implementation of our IJCAI2022 oral paper, ER-SAN: Enhanced-Adaptive Relation Self-Attention Network for Image Captioning.☆24Aug 5, 2023Updated 2 years ago
- ☆22Dec 11, 2024Updated last year
- E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models☆38Jan 5, 2026Updated last month
- [ICLR 2026] 🐻 Uniform Discrete Diffusion with Metric Path for Video Generation☆102Feb 6, 2026Updated last week
- Official Code Release of NeurIPS 2025 Paper: HoloScene: Simulation‑Ready Interactive 3D Worlds from a Single Video☆89Oct 8, 2025Updated 4 months ago
- [Unofficial Implementation] Subject-driven Video Generation via Disentangled Identity and Motion☆58Jan 5, 2026Updated last month
- Video Diffusion Transformers are In-Context Learners☆36Jan 6, 2025Updated last year
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 7 months ago
- Extend the Conditioning of Stable Diffusion to take Audio Embeddings Instead of Text Embeddings using Wav2Vec2-BERT model☆13Sep 25, 2024Updated last year
- Code release for AccDiffusionV2 (TPAMI)☆35Nov 4, 2025Updated 3 months ago
- A Text2SQL benchmark for evaluation of Large Language Models☆41Feb 8, 2026Updated last week
- [NeurIPS 2025] ScaleKV: Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression☆50Nov 4, 2025Updated 3 months ago
- Official Implementations for Paper - HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives☆630Nov 26, 2025Updated 2 months ago
- Official implementation of project NoiseCLR, published at CVPR 2024☆28Jun 15, 2024Updated last year
- The official implementation of "Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers" (arXiv …☆50Jun 6, 2025Updated 8 months ago
- [ICCV 2025 Findings Oral] DNF-Avatar: Distilling Neural Fields for Real-time Animatable Avatar Relighting☆40Nov 20, 2025Updated 2 months ago
- Official Repository for paper "HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding"☆58Jan 23, 2026Updated 3 weeks ago
- Scaling Zero-Shot Reference-to-Video Generation☆62Dec 11, 2025Updated 2 months ago
- ☆108Sep 3, 2025Updated 5 months ago
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025☆33May 1, 2025Updated 9 months ago
- [NeurIPS ENLSP Workshop'24] CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios☆16Oct 18, 2024Updated last year
- ☆18Jun 10, 2025Updated 8 months ago
- [NeurIPS'25] The official code implementation for paper "R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Tok…☆78Feb 10, 2026Updated last week
- [ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegr…☆79Dec 10, 2024Updated last year