mindspore-lab / minddiffusion
A collection of diffusion models based on MindSpore
☆158Updated 9 months ago
Related projects ⓘ
Alternatives and complementary repositories for minddiffusion
- A toolbox of vision models and algorithms based on MindSpore☆237Updated 2 weeks ago
- one for all, Optimal generator with No Exception☆364Updated this week
- My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"☆181Updated this week
- SwissArmyTransformer is a flexible and powerful library to develop your own Transformer variants.☆995Updated last week
- Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pret…☆351Updated this week
- [ICLR2024] The official implementation of paper "VDT: General-purpose Video Diffusion Transformers via Mask Modeling", by Haoyu Lu, Guoxi…☆207Updated 6 months ago
- The official implementation of "Relay Diffusion: Unifying diffusion process across resolutions for image synthesis" [ICLR 2024 Spotlight]☆272Updated 6 months ago
- DeepSpeed教程 & 示例注释 & 学习笔记 (大模型高效训练)☆116Updated last year
- TaiSu(太素)--a large-scale Chinese multimodal dataset(亿级大规模中文视觉语言预训练数据集)☆175Updated 11 months ago
- A PyTorch implementation of the paper "All are Worth Words: A ViT Backbone for Diffusion Models".☆915Updated last year
- A toolbox of yolo models and algorithms based on MindSpore☆101Updated this week
- The official GitHub page for the review paper "Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision M…☆492Updated 7 months ago
- Lossless Training Speed Up by Unbiased Dynamic Data Pruning☆317Updated last month
- VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks☆365Updated 4 months ago
- [CVPR2023] A faster, smaller, and better text-to-image model for large-scale training☆228Updated 10 months ago
- LaVIT: Empower the Large Language Model to Understand and Generate Visual Content☆522Updated last month
- A collection of awesome text-to-image generation studies.☆414Updated this week
- PyTorch implementation of RCG https://arxiv.org/abs/2312.03701☆830Updated last month
- [CVPR2024 Highlight] VBench - We Evaluate Video Generation☆562Updated this week
- Masked Diffusion Transformer is the SOTA for image synthesis. (ICCV 2023)☆524Updated 6 months ago
- 生成扩散模型的Keras实现☆246Updated 8 months ago
- Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks☆283Updated 10 months ago
- MindFace is an open source toolkit based on MindSpore, containing the most advanced face recognition and detection models, such as ArcFa…☆46Updated last year
- Diffusion Model-Based Image Editing: A Survey (arXiv)☆471Updated this week
- Research Code for Multimodal-Cognition Team in Ant Group☆121Updated 3 months ago
- Implementation of MagViT2 Tokenizer in Pytorch☆559Updated 3 weeks ago
- [ICML 2024 Spotlight] FiT: Flexible Vision Transformer for Diffusion Model☆383Updated this week
- Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conv…☆376Updated last month
- MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer☆197Updated 7 months ago
- Code and models for the paper "One Transformer Fits All Distributions in Multi-Modal Diffusion"☆1,368Updated last year