mhh0318 / UniD3Links
☆54Updated 2 years ago
Alternatives and similar repositories for UniD3
Users that are interested in UniD3 are comparing it to the libraries listed below
Sorting:
- (wip) Use LAION-AI's CLIP "conditoned prior" to generate CLIP image embeds from CLIP text embeds.☆27Updated 2 years ago
- Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆56Updated last year
- Simple script to compute CLIP-based scores given a DALL-e trained model.☆30Updated 4 years ago
- [ICLR2023] Discrete Contrastive Diffusion for Cross-Modal Music and Image Generation (CDCD).☆159Updated 2 years ago
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆45Updated last year
- [ACM MM 2022] Towards Counterfactual Image Manipulation via CLIP☆37Updated 2 years ago
- ☆59Updated last year
- ☆50Updated 2 years ago
- Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in Pytorch☆64Updated 3 years ago
- Code for CVPR'2022 paper ✨ "Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-L…☆37Updated 3 years ago
- ☆46Updated last year
- Codebase for the paper-Elucidating the design space of language models for image generation☆45Updated 7 months ago
- Official implementation of the paper "Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models☆172Updated last year
- [ICCV 2023] Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models☆84Updated last year
- ☆48Updated 3 months ago
- Official repo for the TMLR paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"☆29Updated last year
- Code for the paper "If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection"☆27Updated last year
- Research code for paper "Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis"☆112Updated 7 months ago
- An in-context conditioning version of MUSE with pre-trained checkpoints.☆112Updated 2 years ago
- [ICLR2024] Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model☆43Updated 5 months ago
- multimodal video-audio-text generation and retrieval between every pair of modalities on the MUGEN dataset. The repo. contains the traini…☆40Updated 2 years ago
- PyTorch code for "Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention" (WACV 2023)☆33Updated 2 years ago
- Training code for CLIP-FlanT5☆26Updated 10 months ago
- DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models☆46Updated last year
- ☆63Updated last week
- ☆80Updated 7 months ago
- Minimal multi-gpu implementation of EDM2: "Analyzing and Improving the Training Dynamics of Diffusion Models"☆32Updated last year
- ☆40Updated last year
- Code and Data for Paper: SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data☆34Updated last year
- ☆55Updated last year