kakaobrain / magvlt
The official implementation of MAGVLT: Masked Generative Vision-and-Language Transformer (CVPR'23)
☆26Updated 10 months ago
Related projects ⓘ
Alternatives and complementary repositories for magvlt
- Official Implementation (Pytorch) of "Constant Acceleration Flow", NeurIPS 2024☆26Updated this week
- Official Pytorch Implementation of Our CVPR2023 Paper: "Not All Image Regions Matter: Masked Vector Quantization for Autoregressive Image…☆53Updated last year
- Locally Hierarchical Auto-Regressive Modeling for Image Generation (HQ-Transformer)☆26Updated 9 months ago
- An official pytorch implementation of AAAI 2024 paper "Latent Space Editing in Transformer-based Flow Matching"☆27Updated 7 months ago
- [NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective☆41Updated 3 weeks ago
- ☆45Updated 6 months ago
- Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆52Updated last year
- ☆25Updated last week
- This repo contains the official PyTorch implementation of AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image …☆77Updated 5 months ago
- ☆104Updated 4 months ago
- Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in Pytorch☆64Updated 2 years ago
- [ICLR2024] The official implementation of paper "UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling", by …☆70Updated 9 months ago
- [ECCV2024][ICCV2023] Official PyTorch implementation of SeiT++ and SeiT☆51Updated 3 months ago
- The official PyTorch implementation of Fast Diffusion Model☆91Updated last year
- ☆48Updated last year
- https://arxiv.org/abs/2209.15162☆48Updated last year
- [ICLR-2023] Rarity Score : A New Metric to Evaluate the Uncommonness of Synthesized Images☆62Updated 2 years ago
- Data-Efficient Multimodal Fusion on a Single GPU☆47Updated 6 months ago
- Minimal multi-gpu implementation of EDM2: "Analyzing and Improving the Training Dynamics of Diffusion Models"☆26Updated 8 months ago
- Codebase for the Paper: Learning Visual Styles from Audio-Visual Associations (ECCV 2022, in PyTorch)☆14Updated last year
- A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.☆30Updated 4 months ago
- Implementation of RQ Transformer, proposed in the paper "Autoregressive Image Generation using Residual Quantization"☆95Updated 2 years ago
- LAVIS - A One-stop Library for Language-Vision Intelligence☆47Updated 3 months ago
- This repository is for The Power of Sound(TPoS): Audio Reactive Video Generation with Stable Diffusion (ICCV2023)☆22Updated 11 months ago
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆42Updated 3 weeks ago
- Code and Data for Paper: SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data☆32Updated 8 months ago
- Official Implementation for "Consistency Flow Matching: Defining Straight Flows with Velocity Consistency"☆153Updated 4 months ago
- The codebase of our paper "Improving the Training of Rectified Flows"☆83Updated last month
- ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning. In ICCV, 2021.☆54Updated 3 years ago