SHI-Labs / IMG-Multimodal-Diffusion-AlignmentLinks
IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance, ICCV 2025
☆29Updated 3 weeks ago
Alternatives and similar repositories for IMG-Multimodal-Diffusion-Alignment
Users that are interested in IMG-Multimodal-Diffusion-Alignment are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2024] ENAT: Rethinking Spatial-temporal Interactions in Token-based Image Synthesis☆24Updated 11 months ago
- CODA: Repurposing Continuous VAEs for Discrete Tokenization☆31Updated 3 months ago
- [ECCV 2024] AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation☆34Updated last year
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)☆184Updated 2 months ago
- ☆53Updated 2 months ago
- [ECCV 2024] Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators☆45Updated last year
- [NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models☆32Updated 11 months ago
- Official Repo of From Masks to Worlds: A Hitchhiker’s Guide to World Models.☆34Updated this week
- ☆99Updated 3 months ago
- [ICCV 2025 Oral] Official implementation of Learning Streaming Video Representation via Multitask Training.☆60Updated last month
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation