mini-sora / MiniSora-DiT
minisora-DiT, a DiT reproduction based on XTuner from the open source community MiniSora
☆38Updated 7 months ago
Related projects ⓘ
Alternatives and complementary repositories for MiniSora-DiT
- A light-weight and high-efficient training framework for accelerating diffusion tasks.☆41Updated 2 months ago
- [NeurIPS 2024] Efficient Multi-modal Models via Stage-wise Visual Context Compression☆39Updated 3 months ago
- ☆35Updated 5 months ago
- 🔥 Aurora Series: A more efficient multimodal large language model series for video.☆47Updated this week
- Adaptive Caching for Faster Video Generation with Diffusion Transformers☆91Updated 2 weeks ago
- Scaling RWKV-Like Architectures for Diffusion Models☆117Updated 7 months ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆26Updated last month
- Video dataset dedicated to portrait-mode video recognition.☆36Updated 7 months ago
- Explore the Limits of Omni-modal Pretraining at Scale☆89Updated 2 months ago
- The official implementation of Latte: Latent Diffusion Transformer for Video Generation.☆32Updated 8 months ago
- This repository provides an improved LLamaGen Model, fine-tuned on 500,000 high-quality images, each accompanied by over 300 token prompt…☆27Updated last month
- T2VScore: Towards A Better Metric for Text-to-Video Generation☆77Updated 7 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆38Updated 4 months ago
- TerDiT: Ternary Diffusion Models with Transformers☆62Updated 5 months ago
- UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing☆91Updated 2 weeks ago
- Inference-only implementation of "One-Step Diffusion Distillation through Score Implicit Matching" [NIPS 2024]☆36Updated this week
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆17Updated last month
- A Training-free Iterative Framework for Long Story Visualization☆61Updated this week
- ☆38Updated 4 months ago
- VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation☆137Updated 3 weeks ago
- [NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching☆75Updated 4 months ago
- [NeurIPS 2024] VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models☆115Updated last month
- ☆127Updated 2 weeks ago
- 🏞️ Official implementation of "Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition"☆102Updated 6 months ago
- Official implementation of MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis☆84Updated 4 months ago
- ☆193Updated 4 months ago
- Code Release for the paper "Make-A-Story: Visual Memory Conditioned Consistent Story Generation" in CVPR 2023☆37Updated last year
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆87Updated 2 weeks ago