mini-sora / MiniSora-DiT
minisora-DiT, a DiT reproduction based on XTuner from the open source community MiniSora
☆40Updated 11 months ago
Alternatives and similar repositories for MiniSora-DiT:
Users that are interested in MiniSora-DiT are comparing it to the libraries listed below
- A light-weight and high-efficient training framework for accelerating diffusion tasks.☆46Updated 5 months ago
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆35Updated 8 months ago
- ☆40Updated 7 months ago
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark☆79Updated last month
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 5 months ago
- Official Implementation of HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing☆23Updated 2 months ago
- ☆19Updated last year
- TerDiT: Ternary Diffusion Models with Transformers☆68Updated 8 months ago
- ☆18Updated last year
- [CVPR2025] PAR: Parallelized Autoregressive Visual Generation. https://epiphqny.github.io/PAR-project/☆124Updated 2 months ago
- TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation☆29Updated 3 months ago
- This repository provides an improved LLamaGen Model, fine-tuned on 500,000 high-quality images, each accompanied by over 300 token prompt…☆30Updated 4 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆40Updated 8 months ago
- ☆25Updated last year
- ☆19Updated last year
- CutDiffusion: A Simple, Fast, Cheap, and Strong Diffusion Extrapolation Method☆26Updated 10 months ago
- Code Release for the paper "Make-A-Story: Visual Memory Conditioned Consistent Story Generation" in CVPR 2023☆39Updated last year
- [NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression☆53Updated last week
- FORA introduces simple yet effective caching mechanism in Diffusion Transformer Architecture for faster inference sampling.☆38Updated 7 months ago
- The official repo of continuous speculative decoding☆24Updated 3 months ago
- Implementation of SmoothCache, a project aimed at speeding-up Diffusion Transformer (DiT) based GenAI models with error-guided caching.☆38Updated last month
- ☆47Updated 2 months ago
- T2VScore: Towards A Better Metric for Text-to-Video Generation☆78Updated 10 months ago
- 🏞️ Official implementation of "Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition"☆104Updated 9 months ago
- VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation☆85Updated 5 months ago
- ☆55Updated 3 months ago