mindspore-lab / mindone
one for all, Optimal generator with No Exception
☆352Updated last week
Related projects: ⓘ
- A collection of diffusion models based on MindSpore☆157Updated 7 months ago
- A collection of awesome video generation studies.☆258Updated last week
- A toolbox of vision models and algorithms based on MindSpore☆232Updated last week
- [CVPR2024 Highlight] VBench - We Evaluate Video Generation☆490Updated 2 weeks ago
- The official implementation of "Relay Diffusion: Unifying diffusion process across resolutions for image synthesis" [ICLR 2024 Spotlight]☆260Updated 4 months ago
- An initiative to replicate Sora☆98Updated 5 months ago
- A collection of awesome text-to-image generation studies.☆326Updated last week
- Official code of SmartEdit [CVPR-2024 Highlight]☆227Updated 3 months ago
- Efficient Multimodal Large Language Models: A Survey☆230Updated last month
- xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters☆488Updated this week
- MindFace is an open source toolkit based on MindSpore, containing the most advanced face recognition and detection models, such as ArcFa…☆45Updated last year
- A list for Text-to-Video, Image-to-Video works☆167Updated last month
- VideoTetris: Towards Compositional Text-To-Video Generation☆197Updated 2 weeks ago
- A Collection of Papers and Codes for CVPR2024/ECCV2024 AIGC☆409Updated last week
- [CVPR 2024] Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models☆196Updated last week
- [CVPR 2024] DeepCache: Accelerating Diffusion Models for Free☆756Updated 2 months ago
- 🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).☆293Updated 3 weeks ago
- Materials for the Hugging Face Diffusion Models Course☆160Updated last year
- Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis☆365Updated 3 months ago
- Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pret…☆297Updated last week
- The official implementation of Latte: Latent Diffusion Transformer for Video Generation.☆32Updated 6 months ago
- ☆335Updated 2 weeks ago
- ☆96Updated 6 months ago
- [ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"☆591Updated last month
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation☆378Updated 5 months ago
- ☆235Updated last month
- ☆170Updated 5 months ago
- Diffusion Model-Based Image Editing: A Survey (arXiv)☆411Updated last month
- [CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models☆554Updated last month
- [ICML 2024 Spotlight] FiT: Flexible Vision Transformer for Diffusion Model☆357Updated 7 months ago