mlvlab / Representation-ShiftLinks
Official Implementation (Pytorch) of the "Representation Shift: Unifying Token Compression with FlashAttention", ICCV 2025
☆28Updated 4 months ago
Alternatives and similar repositories for Representation-Shift
Users that are interested in Representation-Shift are comparing it to the libraries listed below
Sorting:
- [NeurIPS25] Official Implementation (Pytorch) of "DeepVideo-R1"☆30Updated 3 weeks ago
- ICML2025☆61Updated 3 months ago
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆178Updated 6 months ago
- ☆14Updated 3 months ago
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆234Updated 3 months ago
- Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward☆49Updated 2 weeks ago
- [NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing☆125Updated 3 weeks ago
- Official respository for ReasonGen-R1☆73Updated 5 months ago
- GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning☆100Updated 6 months ago
- ☆37Updated 5 months ago
- The code repository of UniRL☆47Updated 6 months ago
- [CVPR 2025] CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient☆107Updated 2 months ago
- Dimple, the first Discrete Diffusion Multimodal Large Language Model☆112Updated 5 months ago
- Code for "How far can we go with ImageNet for Text-to-Image generation?" paper☆94Updated 3 weeks ago
- [ICCV 2025 Oral] Official implementation of Learning Streaming Video Representation via Multitask Training.☆71Updated 3 weeks ago
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆168Updated last month
- The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"☆68Updated last month
- [CVPR2025] SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories☆83Updated 4 months ago
- Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision☆177Updated 2 weeks ago
- [CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos☆91Updated 7 months ago
- [CVPR 2025] FLAIR: VLM with Fine-grained Language-informed Image Representations☆124Updated 3 months ago
- [NeurIPS 2025] VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models☆131Updated last month
- [IEEE TIP 2025] Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation☆57Updated 2 weeks ago
- Official implementation of "VIRAL: Visual Representation Alignment for MLLMs".☆138Updated 2 months ago
- T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation☆33Updated 2 months ago
- [ICLR'25] Reconstructive Visual Instruction Tuning☆129Updated 8 months ago
- [CVPR 25] A framework named B^2-DiffuRL for RL-based diffusion model fine-tuning.☆49Updated 8 months ago
- [ECCV2024]The official implementation of the DiffPNG paper in PyTorch.☆15Updated last year
- [NeurIPS 2025] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations☆190Updated 2 months ago
- [NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understanding☆46Updated 10 months ago