Manchery / awesome-visual-tokenizerLinks
[WIP🚧] 2025 up-to-date list of resources on visual tokenizers (primarily for visual generation). Give it a star 🌟 if you find it useful.
☆11Updated 5 months ago
Alternatives and similar repositories for awesome-visual-tokenizer
Users that are interested in awesome-visual-tokenizer are comparing it to the libraries listed below
Sorting:
- Official Repository of Personalized Visual Instruct Tuning☆28Updated 3 months ago
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆31Updated 3 months ago
- Fast-Slow Thinking for Large Vision-Language Model Reasoning☆14Updated last month
- Official implementation of LaVin-DiT☆32Updated 4 months ago
- ☆36Updated 2 weeks ago
- official code repo of CVPR 2025 paper PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation☆31Updated 2 months ago
- [NIPS2023]Implementation of Foundation Model is Efficient Multimodal Multitask Model Selector☆36Updated last year
- [ECCV 2024] AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation☆34Updated 8 months ago
- [ECCV 2024] Official pytorch implementation of "Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts"☆43Updated 11 months ago
- ☆12Updated 4 months ago
- [ICLR2025] IV-Mixed Sampler: Leveraging Image Diffusion Models for Enhanced Video Synthesis☆33Updated 3 months ago
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆74Updated 8 months ago
- Official implementation of VRoPE: Rotary Position Embedding for Video Large Language Models.☆20Updated 2 weeks ago
- A PyTorch implementation of the paper "Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis"☆45Updated 11 months ago
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆35Updated 11 months ago
- [ICML 2025] DreamDPO: Aligning Text-to-3D Generation with Human Preferences via Direct Preference Optimization☆11Updated last week
- ☆43Updated 5 months ago
- [CVPR 25] A framework named B^2-DiffuRL for RL-based diffusion model fine-tuning.☆29Updated 2 months ago
- PhysGame Benchmark for Physical Commonsense Evaluation in Gameplay Videos☆44Updated 3 weeks ago
- [ICLR 2025] Implementation of Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding☆39Updated last month
- ☆81Updated 2 months ago
- A instruction data generation system for multimodal language models.☆33Updated 4 months ago
- Codebase for the paper-Elucidating the design space of language models for image generation☆45Updated 6 months ago
- [CVPR 2024 Highlight] ImageNet-D☆43Updated 7 months ago
- [ICML2025] The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation☆107Updated 7 months ago
- VidKV: Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models☆19Updated 2 months ago
- VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models☆16Updated last week
- ☆111Updated last week
- Official Implementation for "Editing Massive Concepts in Text-to-Image Diffusion Models"☆19Updated last year
- [ICLR 2025] Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching☆46Updated last month