MattUnderscoreZhang / videopoet_replication
A replication of Google's VideoPoet model
☆12Updated last year
Alternatives and similar repositories for videopoet_replication:
Users that are interested in videopoet_replication are comparing it to the libraries listed below
- Explore the Limits of Omni-modal Pretraining at Scale☆96Updated 5 months ago
- This is a repo to track the latest autoregressive visual generation papers.☆143Updated this week
- MoVQGAN - model for the image encoding and reconstruction☆218Updated last year
- 📚 Collection of awesome generation acceleration resources.☆142Updated this week
- ☆134Updated last month
- [NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching☆92Updated 7 months ago
- Official repo for StableLLAVA☆94Updated last year
- [NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective☆59Updated 3 months ago
- ☆116Updated 7 months ago
- My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"☆214Updated 3 weeks ago
- Scaling Diffusion Transformers with Mixture of Experts☆260Updated 5 months ago
- 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆256Updated this week
- ☆139Updated 2 months ago
- STAR: Scale-wise Text-to-image generation via Auto-Regressive representations☆136Updated this week
- ☆71Updated 4 months ago
- This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"☆128Updated 8 months ago
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆128Updated 3 months ago
- [NeurIPS 2024] The official code of "U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers"☆182Updated 4 months ago
- Official implementation of the Law of Vision Representation in MLLMs☆149Updated 3 months ago
- Matryoshka Multimodal Models☆97Updated last month
- The official implementation of Diffusion-KTO: Aligning Diffusion Models by Optimizing Human Utility☆36Updated last month
- Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models☆64Updated 8 months ago
- [CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale☆204Updated 11 months ago
- minisora-DiT, a DiT reproduction based on XTuner from the open source community MiniSora☆40Updated 10 months ago
- Liquid: Language Models are Scalable Multi-modal Generators☆65Updated this week
- CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient☆75Updated 3 weeks ago
- The collection of awesome papers on alignment of diffusion models.☆113Updated last week
- [ICML 2024] On Discrete Prompt Optimization for Diffusion Models - Google☆47Updated 6 months ago
- This is the official PyTorch implementation of "ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality"☆45Updated last month
- Codes accompanying the paper "Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment"☆25Updated last week