CVMI-Lab/VFMTok

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/CVMI-Lab/VFMTok)

CVMI-Lab / VFMTok

(NeurIPS 2025) Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation

☆77

Alternatives and similar repositories for VFMTok

Users that are interested in VFMTok are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

CVMI-Lab / Hita
View on GitHub
(ICCV 2025) Holistic Tokenizer for Autoregressive Image Generation
☆34Oct 9, 2025Updated 9 months ago
OneIG-Bench / OneIG-Benchmark
View on GitHub
[NeurIPS 2025 DB] OneIG-Bench is a meticulously designed comprehensive benchmark framework for fine-grained evaluation of T2I models acro…
☆120Feb 10, 2026Updated 5 months ago
ZhangqiJiang07 / GEditBench_v2
View on GitHub
GEditBench v2: A Human-Aligned Benchmark for General Image Editing
☆60Jun 18, 2026Updated last month
MKJia / DINO-Tok
View on GitHub
[Arxiv'25] DINO-Tok: Adapting DINO for Visual Tokenizers
☆40Apr 11, 2026Updated 3 months ago
Kr1sJFU / iMontage
View on GitHub
iMontage: Unified, Versatile, Highly Dynamic Many-to-many Image Generation
☆188Dec 1, 2025Updated 7 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
zhuangshaobin / WeTok
View on GitHub
[ICLR2026] WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction
☆69Sep 3, 2025Updated 10 months ago
nnnth / UniLIP
View on GitHub
[ICLR 2026 🔥 ] Official implementation of "UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing"
☆151Jan 26, 2026Updated 5 months ago
Peyton-Chen / RegionE
View on GitHub
[ICLR 2026] The official implementation of "RegionE: Adaptive Region-Aware Generation for Efficient Image Editing"
☆109Feb 3, 2026Updated 5 months ago
FoundationVision / UniTok
View on GitHub
[NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding
☆529Nov 14, 2025Updated 8 months ago
YuqingWang1029 / CubiD
View on GitHub
[CVPR2026 Highlight] Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens https://arxiv.org/abs…
☆63Apr 10, 2026Updated 3 months ago
ZhengrongYue / UniFlow
View on GitHub
Official Implementation of "UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation"
☆143Oct 17, 2025Updated 9 months ago
bytetriper / RAE
View on GitHub
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
☆1,978Feb 25, 2026Updated 4 months ago
VincentDENGP / 3D-LR
View on GitHub
Can 3D Vision-Language Models Truly Understand Natural Language?
☆20Mar 28, 2024Updated 2 years ago
Jiawei-Yang / DeTok
View on GitHub
Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"
☆195Feb 24, 2026Updated 4 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
Raphoo / linear-mech-vlms
View on GitHub
Code for "Linear Mechanisms for Spatiotemporal Reasoning in Vision Language Models"
☆15Feb 16, 2026Updated 5 months ago
CVMI-Lab / clip-beyond-tail
View on GitHub
(NeurIPS 2024) What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights
☆27Oct 28, 2024Updated last year
MiniMax-AI / VTP
View on GitHub
[ECCV 2026] Towards Scalable Pre-training of Visual Tokenizers for Generation
☆495Apr 15, 2026Updated 3 months ago
csuhan / Tar
View on GitHub
[NeurIPS 2025] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
☆202Sep 18, 2025Updated 10 months ago
ZhengrongYue / PAE
View on GitHub
Official Implementation of "What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion"
☆74May 27, 2026Updated last month
huang-yh / SpectralAR
View on GitHub
[ICCV 25]SpectralAR: Spectral Autoregressive Visual Generation
☆36Jun 13, 2025Updated last year
leoisufa / ICVE
View on GitHub
[Preprint 2025] ICVE: In-Context Learning with Unpaired Clips for Instruction-based Video Editing
☆25Jun 2, 2026Updated last month
Doby-Xu / WithAnyone
View on GitHub
✨ [ICLR'26] WithAnyone is capable of generating high-quality, controllable, and ID consistent images
☆571Mar 21, 2026Updated 4 months ago
ByteVisionLab / TokenFlow
View on GitHub
[CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".
☆464Aug 8, 2025Updated 11 months ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
Neur-IO / ReVQ
View on GitHub
Explore how to get a VQ-VAE models efficiently!
☆69Jul 24, 2025Updated 11 months ago
arijitray1993 / SAT
View on GitHub
Spatial Aptitude Training for Multimodal Langauge Models
☆33Feb 8, 2026Updated 5 months ago
SilentView / GigaTok
View on GitHub
[ICCV 2025] Official repo for "GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation"
☆204Jan 7, 2026Updated 6 months ago
PKU-YuanGroup / UniSandBox
View on GitHub
Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward
☆60Nov 27, 2025Updated 7 months ago
hustvl / VGT
View on GitHub
Visual Generation Tuning
☆101Apr 16, 2026Updated 3 months ago
baaivision / Emu3.5
View on GitHub
Native Multimodal Models are World Learners
☆1,537Dec 30, 2025Updated 6 months ago
yfChang-cv / FVQ
View on GitHub
Official Implementation of Paper: FVQ: Scalable Training for Vector-Quantized Networks with 100% Codebook Utilization (ICLR2026)
☆26Jan 30, 2026Updated 5 months ago
Dorniwang / UniVerse-1-code
View on GitHub
The official UniVerse-1 code.
☆129Oct 13, 2025Updated 9 months ago
TencentARC / SEED-Voken
View on GitHub
SEED-Voken: A Series of Powerful Visual Tokenizers
☆1,018Nov 25, 2025Updated 7 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
stepfun-ai / NextStep-1
View on GitHub
[🚀 ICLR 2026 Oral] NextStep-1: SOTA Autogressive Image Generation with Continuous Tokens. A research project developed by the StepFun’s …
☆689Feb 27, 2026Updated 4 months ago
MCG-NJU / Video-DC
View on GitHub
☆12Jul 30, 2025Updated 11 months ago
ViStoryBench / vistorybench
View on GitHub
[CVPR 2026] ViStoryBench: AI Story Visualization Benchmark
☆163May 10, 2026Updated 2 months ago
QwenLM / Qwen-Image-Bench
View on GitHub
☆128Jun 18, 2026Updated last month
divyakraman / AerialDiffusion
View on GitHub
Codebase for the paper Aerial Diffusion: Text Guided Ground-to-Aerial View Translation from a Single Image using Diffusion Models
☆13Oct 3, 2023Updated 2 years ago
CVMI-Lab / IST-Net
View on GitHub
(ICCV2023) IST-Net: Prior-free Category-level Pose Estimation with Implicit Space Transformation
☆120Dec 7, 2023Updated 2 years ago
MME-Benchmarks / MME-Unify
View on GitHub
✨✨ [ICLR 2026] MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
☆42Apr 10, 2025Updated last year