mini-sora / MiniSora-DiT
minisora-DiT, a DiT reproduction based on XTuner from the open source community MiniSora
☆40Updated last year
Alternatives and similar repositories for MiniSora-DiT:
Users that are interested in MiniSora-DiT are comparing it to the libraries listed below
- A light-weight and high-efficient training framework for accelerating diffusion tasks.☆46Updated 6 months ago
- Finetuning and inference tools for the CogView4 and CogVideoX model series.☆29Updated this week
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆35Updated 9 months ago
- ☆47Updated 3 months ago
- ☆50Updated last month
- The official implementation of "Neighboring Autoregressive Modeling for Efficient Visual Generation"☆32Updated 2 weeks ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 6 months ago
- ☆49Updated 3 months ago
- ☆31Updated 2 months ago
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark☆86Updated 2 months ago
- TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation☆29Updated 4 months ago
- ☆19Updated last year
- Inference-only implementation of "One-Step Diffusion Distillation through Score Implicit Matching" [NIPS 2024]☆77Updated 4 months ago
- ☆33Updated last month
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆41Updated 9 months ago
- [CVPR2025] PAR: Parallelized Autoregressive Visual Generation. https://yuqingwang1029.github.io/PAR-project/☆129Updated 2 weeks ago
- CutDiffusion: A Simple, Fast, Cheap, and Strong Diffusion Extrapolation Method☆26Updated 11 months ago
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆42Updated 7 months ago
- ☆62Updated 7 months ago
- T2VScore: Towards A Better Metric for Text-to-Video Generation☆79Updated 11 months ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆63Updated last month
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆26Updated last month
- Scaling RWKV-Like Architectures for Diffusion Models☆126Updated 11 months ago
- ☆102Updated 9 months ago
- Official Implementation of HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing☆25Updated 3 months ago
- ☆25Updated last year
- ☆140Updated 2 months ago
- Implementation code of the paper MIGE: A Unified Framework for Multimodal Instruction-Based Image Generation and Editing☆41Updated 3 weeks ago
- LMM solved catastrophic forgetting, AAAI2025☆40Updated 4 months ago
- [ICLR 2025] Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching☆41Updated last month