blairstar / The_Art_of_DPMLinks
An In-depth Analysis of Diffusion Probability Model
☆118Updated last year
Alternatives and similar repositories for The_Art_of_DPM
Users that are interested in The_Art_of_DPM are comparing it to the libraries listed below
Sorting:
- https://www.shoufachen.com/Awesome-Diffusion-Transformers/☆151Updated last year
- Keras implement of Finite Scalar Quantization☆83Updated 2 years ago
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆39Updated last year
- A light-weight and high-efficient training framework for accelerating diffusion tasks.☆51Updated last year
- ☆33Updated 9 months ago
- An initiative to replicate Sora☆104Updated last year
- My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"☆268Updated this week
- ☆82Updated 2 years ago
- The official GitHub page for the survey paper "Discrete Tokenization for Multimodal LLMs: A Comprehensive Survey". And this paper is unde…☆76Updated 5 months ago
- Inference-only implementation of "One-Step Diffusion Distillation through Score Implicit Matching" [NIPS 2024]☆84Updated last year
- Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.☆268Updated last month
- official code for paper: Exploring Domain Incremental Video Highlights Detection with the LiveFood Benchmark☆40Updated 2 years ago
- MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)☆144Updated 11 months ago
- A list for Text-to-Video, Image-to-Video works☆251Updated 7 months ago
- The official implementation of our paper "Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption"☆38Updated 7 months ago
- Our 2nd-gen LMM☆34Updated last year
- ☆34Updated 2 years ago
- TaiSu(太素)--a large-scale Chinese multimodal dataset(亿级大规模中文视觉语言预训练数据集)☆189Updated 2 years ago
- ChatSD is designed to make image generation tasks easily☆21Updated 2 years ago
- ☆72Updated 2 years ago
- Video dataset dedicated to portrait-mode video recognition.☆55Updated 2 months ago
- differentiable top-k operator☆22Updated last year
- ☆118Updated 2 years ago
- A replication of Google's VideoPoet model☆11Updated last year
- Follow the rapid development of AIGC models and applications. | 跟上AIGC模型和应用快速发展的步伐 🚀☆81Updated 2 years ago
- The official GitHub page for the review paper "Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision M…☆506Updated last year
- [NeurIPS 2024] VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models☆170Updated last year
- OpenVideo specializes in the domain of text-to-video generation, with the goal of providing high-quality and diverse video datasets to AI…☆113Updated 7 months ago
- Margin-based Vision Transformer☆61Updated last month
- Chinese CLIP models with SOTA performance.☆60Updated 2 years ago