TencentARC / Open-MAGVIT2

Open-MAGVIT2: Democratizing Autoregressive Visual Generation

☆705

Related projects ⓘ

Alternatives and complementary repositories for Open-MAGVIT2

bytedance / 1d-tokenizer
This repo contains the code for 1D tokenizer and generator
☆548Updated this week
lucidrains / magvit2-pytorch
Implementation of MagViT2 Tokenizer in Pytorch
☆564Updated last month
sihyun-yu / REPA
Official Pytorch Implementation of Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
☆664Updated this week
willisma / SiT
Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"
☆682Updated 8 months ago
FoundationVision / LlamaGen
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
☆1,324Updated 3 months ago
LTH14 / mar
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
☆1,011Updated last month
showlab / Show-o
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
☆1,029Updated last week
whlzy / FiT
[ICML 2024 Spotlight] FiT: Flexible Vision Transformer for Diffusion Model
☆389Updated last week
snap-research / Panda-70M
[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
☆526Updated 3 weeks ago
mira-space / MiraData
Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"
☆371Updated 2 months ago
sail-sg / MDT
Masked Diffusion Transformer is the SOTA for image synthesis. (ICCV 2023)
☆526Updated 6 months ago
lucidrains / transfusion-pytorch
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
☆725Updated this week
NVIDIA / Cosmos-Tokenizer
A suite of image and video neural tokenizers
☆796Updated last week
mit-han-lab / hart
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
☆340Updated last month
jy0205 / LaVIT
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
☆532Updated last month
Vchitect / VBench
[CVPR2024 Highlight] VBench - We Evaluate Video Generation
☆580Updated 2 weeks ago
AILab-CVC / SEED
Official implementation of SEED-LLaMA (ICLR 2024).
☆579Updated last month
RunpeiDong / DreamLLM
[ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation
☆395Updated 7 months ago
Alpha-VLLM / Lumina-mGPT
Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraini…
☆500Updated 3 months ago
ai-forever / MoVQGAN
MoVQGAN - model for the image encoding and reconstruction
☆197Updated last year
Meituan-AutoML / VisionLLaMA
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
☆367Updated 4 months ago
SalesforceAIResearch / DiffusionDPO
Code for "Diffusion Model Alignment Using Direct Preference Optimization"
☆269Updated 10 months ago
bfshi / scaling_on_scales
When do we not need larger vision models?
☆336Updated this week
mira-space / Mira
☆349Updated last month
buoyancy99 / diffusion-forcing
code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"
☆615Updated last week
showlab / Awesome-Unified-Multimodal-Models
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
☆215Updated 2 weeks ago
allenai / unified-io-2
☆573Updated 9 months ago
Anima-Lab / MaskDiT
Code for Fast Training of Diffusion Models with Masked Transformers
☆373Updated 6 months ago
magic-research / PLLaVA
Official repository for the paper PLLaVA
☆593Updated 3 months ago
mihirp1998 / AlignProp
AlignProp uses direct reward backpropogation for the alignment of large-scale text-to-image diffusion models. Our method is 25x more samp…
☆242Updated 2 weeks ago