huggingface / open-muse
Open reproduction of MUSE for fast text2image generation.
☆321Updated 3 months ago
Related projects: ⓘ
- Official PyTorch implementation of the paper "In-Context Learning Unlocked for Diffusion Models"☆371Updated 5 months ago
- Huggingface-compatible SDXL Unet implementation that is readily hackable☆381Updated last year
- Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis☆305Updated 10 months ago
- Get hundred of million of image+url from the crawling at home dataset and preprocess them☆201Updated 3 months ago
- Easily create large video dataset from video urls☆530Updated last month
- ☆415Updated 7 months ago
- Better Aligning Text-to-Image Models with Human Preference. ICCV 2023☆264Updated last year
- [NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"☆292Updated 3 months ago
- Large-scale text-video dataset. 10 million captioned short videos.☆575Updated last month
- Simple large-scale training of stable diffusion with multi-node support.☆122Updated last year
- PyTorch implementation of InstructDiffusion, a unifying and generic framework for aligning computer vision tasks with human instructions.☆371Updated 4 months ago
- An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal …☆358Updated 9 months ago
- AlignProp uses direct reward backpropogation for the alignment of large-scale text-to-image diffusion models. Our method is 25x more samp…☆228Updated 6 months ago
- Code for instruction-tuning Stable Diffusion.☆190Updated 7 months ago
- LaVIT: Empower the Large Language Model to Understand and Generate Visual Content☆498Updated 2 months ago
- ReVersion: Diffusion-Based Relation Inversion from Images☆454Updated 2 months ago
- ☆147Updated last year
- Official implementation of SEED-LLaMA (ICLR 2024).☆557Updated 5 months ago
- Official implementation of "Controlling Text-to-Image Diffusion by Orthogonal Finetuning".☆279Updated 9 months ago
- [CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers☆490Updated 2 months ago
- Code for "Diffusion Model Alignment Using Direct Preference Optimization"☆229Updated 8 months ago
- 🐟 Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".☆417Updated 8 months ago
- Description and pointers of laion datasets☆230Updated last year
- DataComp: In search of the next generation of multimodal datasets☆637Updated 8 months ago
- ☆328Updated last year
- A linear estimator on top of clip to predict the aesthetic quality of pictures☆439Updated 2 years ago
- LVDM: Latent Video Diffusion Models for High-Fidelity Long Video Generation☆440Updated 10 months ago
- LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models (LLM-grounded Diffusi…☆407Updated last week
- This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation☆394Updated last week
- Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis☆365Updated 3 months ago