Manchery / awesome-visual-tokenizerLinks
[WIPπ§] 2025 up-to-date list of resources on visual tokenizers (primarily for visual generation). Give it a star π if you find it useful.
β20Updated last year
Alternatives and similar repositories for awesome-visual-tokenizer
Users that are interested in awesome-visual-tokenizer are comparing it to the libraries listed below
Sorting:
- Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Visionβ192Updated 2 weeks ago
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Modelsβ92Updated last year
- Official repository for LLaVA-Reward (ICCV 2025): Multimodal LLMs as Customized Reward Models for Text-to-Image Generationβ22Updated 5 months ago
- List of diffusion related active submissions on OpenReview for ICLR 2025.β51Updated last year
- β80Updated 6 months ago
- [NIPS2023]Implementation of Foundation Model is Efficient Multimodal Multitask Model Selectorβ37Updated last year
- [ICLR 2025] SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image and Video Generationβ50Updated 11 months ago
- [ECCV 2024] Official pytorch implementation of "Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts"β47Updated last year
- [NeurIPS'24] I2EBench: A Comprehensive Benchmark for Instruction-based Image Editingβ29Updated last month
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoTβ112Updated 2 months ago
- a collection of awesome autoregressive visual generation modelsβ79Updated 8 months ago
- [CVPR 2024 Highlight] ImageNet-Dβ46Updated last year
- β41Updated last year
- (ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.β32Updated 5 months ago
- [ICML2025] The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generationβ143Updated last year
- β57Updated 2 years ago
- Code for ICML 2025 Paper "Highly Compressed Tokenizer Can Generate Without Training"β195Updated 7 months ago
- [ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrβ¦β79Updated last year
- β104Updated 2 months ago
- [ICML 2024] On Discrete Prompt Optimization for Diffusion Models - Googleβ63Updated last year
- Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"β167Updated 3 weeks ago
- Visual Instruction-guided Explainable Metric. Code for "Towards Explainable Metrics for Conditional Image Synthesis Evaluation" (ACL 2024β¦β60Updated last year
- Autoregressive Image Generation with Randomized Parallel Decodingβ82Updated 2 months ago
- fixed official code for paper "A Closer Look at Parameter-Efficient Tuning in Diffusion Models".β42Updated 2 years ago
- Official code for ICLR 2024 paper "Do Generated Data Always Help Contrastive Learning?"β31Updated last year
- β38Updated 2 months ago
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modelingβ40Updated 10 months ago
- Training code for CLIP-FlanT5β30Updated last year
- Source code for "A Dense Reward View on Aligning Text-to-Image Diffusion with Preference" (ICML'24).β40Updated last year
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generationβ94Updated 10 months ago