FoundationVision / vaex
🔥stable, simple, state-of-the-art VQVAE toolkit & cookbook
☆34Updated 2 months ago
Related projects: ⓘ
- Implements VAR+CLIP for image generation☆64Updated last month
- ICCV2023-Diffusion-Papers☆110Updated last year
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆103Updated 3 weeks ago
- ☆89Updated 4 months ago
- Official implementation of MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis☆75Updated 2 months ago
- This is the official implementation for ControlVAR.☆28Updated last week
- RichHF-18K dataset contains rich human feedback labels we collected for our CVPR'24 paper: https://arxiv.org/pdf/2312.10240, along with t…☆96Updated 2 months ago
- An in-context conditioning version of MUSE with pre-trained checkpoints.☆105Updated last year
- ☆72Updated 5 months ago
- VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation☆84Updated last week
- Official code for paper: Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language☆20Updated 2 months ago
- official repo for "VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation"☆37Updated this week
- Official repository of paper "Subobject-level Image Tokenization"☆58Updated 4 months ago
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆85Updated 2 weeks ago
- A PyTorch implementation of the paper "Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis"☆26Updated 3 months ago
- [CVPR 2024] On the Content Bias in Fréchet Video Distance☆73Updated last month
- Official code for CVPR 2024 paper: Discriminative Probing and Tuning for Text-to-Image Generation☆23Updated 2 weeks ago
- Unofficial implementation of "SODA: Bottleneck Diffusion Models for Representation Learning"☆73Updated 5 months ago
- [BSQ-ViT] Image and Video Tokenization with Binary Spherical Quantization☆74Updated 3 months ago
- ☆32Updated 3 months ago
- Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models☆43Updated 3 months ago
- Official implementation of the Law of Vision Representation in MLLMs☆93Updated last week
- ☆104Updated 2 months ago
- ☆113Updated 2 months ago
- LLMBind: A Unified Modality-Task Integration Framework☆14Updated 3 months ago
- ☆25Updated last month
- Code release for NeurIPS 2023 paper SlotDiffusion: Object-centric Learning with Diffusion Models☆76Updated 8 months ago
- Code release for LayoutDiffuse☆47Updated last year
- DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention☆106Updated 3 months ago
- [CVPR 2023] Zero-shot Generative Model Adaptation via Image-specific Prompt Learning☆82Updated last year