Image Tokenizer Needs Post-Training
☆24Oct 4, 2025Updated 4 months ago
Alternatives and similar repositories for RobusTok
Users that are interested in RobusTok are comparing it to the libraries listed below
Sorting:
- ☆13Sep 2, 2023Updated 2 years ago
- Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model☆13Dec 29, 2024Updated last year
- ☆14May 4, 2025Updated 9 months ago
- [ICML 2025 Tokshop] One-D-Piece: Image Tokenizer Meets Quality-Controllable Compression☆77Jul 30, 2025Updated 7 months ago
- [CVPR'25 - Rating 555] Official PyTorch implementation of Lumos: Learning Visual Generative Priors without Text☆53Mar 16, 2025Updated 11 months ago
- Test-time Scaling for VAR models☆31Sep 19, 2025Updated 5 months ago
- MetaAgent: Toward Self-Evolving Agent via Tool Meta-Learning☆42Sep 3, 2025Updated 6 months ago
- This repository is the official implementation for the paper “REFRAME: Reflective Surface Real-Time Rendering for Mobile Devices”.☆21Jul 27, 2025Updated 7 months ago
- ☆22Sep 26, 2024Updated last year
- ☆20Nov 14, 2022Updated 3 years ago
- Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"☆175Feb 24, 2026Updated last week
- [NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation☆76Sep 19, 2025Updated 5 months ago
- Code of the paper "FreePCA:Integrating Consistency Information across Long-short Frames in Training-free Long Video Generation via Princi…☆28Aug 26, 2025Updated 6 months ago
- Stability-AI's SV3D (ECCV 2024 oral, Voleti et al.) in the diffusers convention.☆31Feb 5, 2025Updated last year
- An implementation of 'simple diffusion: End-to-end diffusion for high resolution images' as published by Hoogeboom et al.☆40Feb 9, 2025Updated last year
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 8 months ago
- ☆32Dec 20, 2023Updated 2 years ago
- Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning☆45Jul 2, 2025Updated 8 months ago
- This repository provides an improved LLamaGen Model, fine-tuned on 500,000 high-quality images, each accompanied by over 300 token prompt…☆30Oct 21, 2024Updated last year
- [ICCV 2025] Amodal Depth Anything: Amodal Depth Estimation in the Wild☆39Feb 21, 2026Updated last week
- Official code for the paper: Can3Tok (ICCV2025)☆39Aug 23, 2025Updated 6 months ago
- ☆51Aug 22, 2025Updated 6 months ago
- [ICCV 2025] Pytorch implementation of "VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Pr…☆49Jul 28, 2025Updated 7 months ago
- [3DV 2024] Revisiting Depth Completion from a Stereo Matching Perspective for Cross-domain Generalization☆33Mar 17, 2025Updated 11 months ago
- MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment☆35Jul 1, 2024Updated last year
- [ICML'25] EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling.☆174Jun 26, 2025Updated 8 months ago
- Marigold adapted for video estimation☆30Mar 30, 2024Updated last year
- [ICLR 2025] DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference☆49Jun 17, 2025Updated 8 months ago
- [CVPR 2023] Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark☆85Dec 20, 2022Updated 3 years ago
- This repository includes the official implementation of our paper "Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generat…☆245Oct 12, 2025Updated 4 months ago
- [ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model☆37Nov 27, 2024Updated last year
- Official Implementation of "Align-Then-stEer: Adapting the Vision-Language Action Models through Unified Latent Guidance".☆64Oct 16, 2025Updated 4 months ago
- Minimalist RL for Diffusion LLMs with SOTA reasoning performance (89.1% GSM8K). Official implementation of "The Flexibility Trap".☆126Jan 24, 2026Updated last month
- ☆10Sep 4, 2021Updated 4 years ago
- Modern normalizing flows in Python. Simple to use and easily extensible.☆12Feb 11, 2026Updated 2 weeks ago
- Twinkle✨: Training workbench to make your model glow.☆45Updated this week
- OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models☆29Feb 4, 2026Updated 3 weeks ago
- [ICLR 2025] Where Am I and What Will I See : An Auto-Regressive Model for Spatial Localization and View Prediction☆44Aug 9, 2025Updated 6 months ago
- Azure Machine Learning - MLOps Python SDKv2☆10Jul 24, 2023Updated 2 years ago