cofe-ai / O2-MAGVIT2Links

Open Source Implementation of Dual Modality MAGVIT2 Tokenizer

☆21

Alternatives and similar repositories for O2-MAGVIT2

Users that are interested in O2-MAGVIT2 are comparing it to the libraries listed below

Sorting:

TencentARC / SEED-Voken
SEED-Voken: A Series of Powerful Visual Tokenizers
☆957Updated this week
lucidrains / transfusion-pytorch
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
☆1,241Updated 2 weeks ago
bytedance / 1d-tokenizer
This repo contains the code for 1D tokenizer and generator
☆1,061Updated 7 months ago
lucidrains / magvit2-pytorch
Implementation of MagViT2 Tokenizer in Pytorch
☆642Updated 9 months ago
willisma / SiT
Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"
☆995Updated last year
LTH14 / mar
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
☆1,771Updated last year
NVlabs / Long-RL
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
☆644Updated last month
ByteVisionLab / TokenFlow
[CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".
☆395Updated 2 months ago
TRI-ML / prismatic-vlms
A flexible and efficient codebase for training visually-conditioned language models (VLMs)
☆830Updated last year
showlab / Show-o
[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
☆1,751Updated this week
mit-han-lab / vila-u
[ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
☆398Updated 6 months ago
sihyun-yu / REPA
[ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
☆1,382Updated 7 months ago
kvfrans / shortcut-models
☆659Updated 10 months ago
buoyancy99 / diffusion-forcing
code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"
☆1,049Updated 6 months ago
test-time-training / ttt-lm-jax
Official JAX implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States
☆423Updated last year
rese1f / Awesome-VQVAE
A collection of resources and papers on Vector Quantized Variational Autoencoder (VQ-VAE) and its application
☆318Updated 8 months ago
zipzou / hf-multitask-trainer
The trainer for HF to record losses of different tasks and objectives.
☆46Updated 7 months ago
naver-ai / rope-vit
[ECCV 2024] Official PyTorch implementation of RoPE-ViT "Rotary Position Embedding for Vision Transformer"
☆404Updated 10 months ago
LMM101 / Awesome-Multimodal-Next-Token-Prediction
[Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
☆453Updated 9 months ago
Gsunshine / meanflow
JAX implementation of MeanFlow
☆456Updated 2 months ago
bytetriper / RAE
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
☆1,311Updated last week
lxa9867 / ImageFolder
High-performance Image Tokenizers for VAR and AR
☆292Updated 6 months ago
showlab / Awesome-Unified-Multimodal-Models
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
☆725Updated 2 weeks ago
valeoai / Halton-MaskGIT
[ICLR2025] Halton Scheduler for Masked Generative Image Transformer
☆269Updated 3 months ago
baaivision / EVE
EVE Series: Encoder-Free Vision-Language Models from BAAI
☆353Updated 3 months ago
h-zhao1997 / cobra
[AAAI-25] Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference
☆289Updated 9 months ago
jy0205 / LaVIT
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
☆594Updated last year
FoundationVision / LlamaGen
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
☆1,879Updated last year
pkunlp-icler / FastV
[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Langua…
☆502Updated 9 months ago
FanqingM / MM-Eureka-V0
MM-Eureka V0 also called R1-Multimodal-Journey, Latest version is in MM-Eureka
☆320Updated 4 months ago