NVlabs/QLIP

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/NVlabs/QLIP)

NVlabs / QLIP

[arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation

☆97

Alternatives and similar repositories for QLIP

Users that are interested in QLIP are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

SilentView / GigaTok
View on GitHub
[ICCV 2025] Official repo for "GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation"
☆204Jan 7, 2026Updated 6 months ago
mit-han-lab / vila-u
View on GitHub
[ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
☆425Apr 25, 2025Updated last year
FoundationVision / UniTok
View on GitHub
[NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding
☆529Nov 14, 2025Updated 8 months ago
zhaoyue-zephyrus / npq-vit
View on GitHub
[ICLR 2025] Binary Spherical Quantization + [CVPR 2026] Leech Spherical Quantization
☆221Dec 18, 2025Updated 7 months ago
rongyaofang / PUMA
View on GitHub
Empowering Unified MLLM with Multi-granular Visual Generation
☆132Jan 16, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
elad-amrani / xtra
View on GitHub
PyTorch implementation of "Sample- and Parameter-Efficient Auto-Regressive Image Models" from CVPR 2025
☆14Nov 21, 2025Updated 8 months ago
lxa9867 / ImageFolder
View on GitHub
High-performance Image Tokenizers for VAR and AR
☆307Apr 25, 2025Updated last year
baaivision / EVE
View on GitHub
EVE Series: Encoder-Free Vision-Language Models from BAAI
☆374Jul 24, 2025Updated 11 months ago
TencentARC / Divot
View on GitHub
Diffusion Powers Video Tokenizer for Comprehension and Generation (CVPR 2025)
☆87Feb 27, 2025Updated last year
ByteVisionLab / TokenFlow
View on GitHub
[CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".
☆464Aug 8, 2025Updated 11 months ago
ziqipang / RandAR
View on GitHub
[CVPR 2025 (Oral)] Open implementation of "RandAR"
☆208Jul 14, 2025Updated last year
Jiawei-Yang / DeTok
View on GitHub
Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"
☆195Feb 24, 2026Updated 4 months ago
MonoFormer / MonoFormer
View on GitHub
The official implementation for "MonoFormer: One Transformer for Both Diffusion and Autoregression"
☆92Oct 12, 2024Updated last year
wdrink / SimpleAR
View on GitHub
Pytorch implementation for the paper titled "SimpleAR: Pushing the Frontier of Autoregressive Visual Generation"
☆431Jun 20, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
microsoft / VidTok
View on GitHub
a family of versatile and state-of-the-art video tokenizers.
☆453Sep 1, 2025Updated 10 months ago
SOTAMak1r / GST
View on GitHub
[ICLR 2025] Where Am I and What Will I See : An Auto-Regressive Model for Spatial Localization and View Prediction
☆45Aug 9, 2025Updated 11 months ago
ShivamDuggal4 / adaptive-length-tokenizer
View on GitHub
Adaptive Length Image Tokenization via Recurrent Allocation | How many tokens is an image worth ?
☆146Feb 11, 2025Updated last year
inclusionAI / Ming-UniVision
View on GitHub
Code release for Ming-UniVision: Joint Image Understanding and Geneation with a Continuous Unified Tokenizer
☆143Oct 14, 2025Updated 9 months ago
wusize / Harmon
View on GitHub
[ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
☆191May 21, 2025Updated last year
turingmotors / One-D-Piece
View on GitHub
[ICML 2025 Tokshop] One-D-Piece: Image Tokenizer Meets Quality-Controllable Compression
☆81Jul 30, 2025Updated 11 months ago
FoundationVision / LlamaGen
View on GitHub
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
☆1,959Aug 15, 2024Updated last year
zh460045050 / VQGAN-LC
View on GitHub
☆145Jun 28, 2024Updated 2 years ago
OliverRensu / xAR
View on GitHub
This repository includes the official implementation of our paper "Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generat…
☆251Oct 12, 2025Updated 9 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
chenllliang / DnD-Transformer
View on GitHub
[ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegr…
☆80Dec 10, 2024Updated last year
markweberdev / maskbit
View on GitHub
Implementation of the paper "MaskBit: Embedding-free Image Generation from Bit Tokens"
☆94Apr 10, 2025Updated last year
FoundationVision / Liquid
View on GitHub
(Accepted by IJCV) Liquid: Language Models are Scalable and Unified Multi-modal Generators
☆642Jun 1, 2026Updated last month
haoningwu3639 / MegaFusion
View on GitHub
[WACV 2025] MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning
☆101Apr 17, 2025Updated last year
PKU-YuanGroup / Next-Patch-Prediction
View on GitHub
[AAAI26] Next Patch Prediction
☆129Jan 2, 2025Updated last year
Hhhhhhao / continuous_tokenizer
View on GitHub
☆321May 29, 2025Updated last year
zhangjiewu / awesome-t2i-eval
View on GitHub
A curated list of papers and resources for text-to-image evaluation.
☆30Sep 6, 2023Updated 2 years ago
csuhan / Tar
View on GitHub
[NeurIPS 2025] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
☆202Sep 18, 2025Updated 10 months ago
ZhengrongYue / UniFlow
View on GitHub
Official Implementation of "UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation"
☆143Oct 17, 2025Updated 9 months ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
causalfusion / causalfusion
View on GitHub
☆196Dec 17, 2024Updated last year
Neur-IO / ReVQ
View on GitHub
Explore how to get a VQ-VAE models efficiently!
☆69Jul 24, 2025Updated 11 months ago
ThisisBillhe / NAR
View on GitHub
[ICCV 2025] The official implementation of "Neighboring Autoregressive Modeling for Efficient Visual Generation"
☆62Apr 5, 2025Updated last year
Beckschen / ViTamin
View on GitHub
[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"
☆211Jun 9, 2024Updated 2 years ago
fabbrimatteo / VHA
View on GitHub
This repository contains the source code related to the paper Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation
☆11Jun 23, 2020Updated 6 years ago
FoundationVision / OmniTokenizer
View on GitHub
[NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.
☆325Jul 9, 2024Updated 2 years ago
showlab / Show-o
View on GitHub
[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
☆1,963Jan 8, 2026Updated 6 months ago