mlvlab / Representation-ShiftLinks
Official Implementation (Pytorch) of the "Representation Shift: Unifying Token Compression with FlashAttention", ICCV 2025
☆30Updated 6 months ago
Alternatives and similar repositories for Representation-Shift
Users that are interested in Representation-Shift are comparing it to the libraries listed below
Sorting:
- [NeurIPS25] Official Implementation (Pytorch) of "DeepVideo-R1"☆31Updated 2 months ago
- ☆42Updated 7 months ago
- [ICCV 2025 Oral] Official implementation of Learning Streaming Video Representation via Multitask Training.☆80Updated last month
- The official repository of our paper "Reinforcing Video Reasoning with Focused Thinking"☆34Updated 7 months ago
- Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"☆172Updated last month
- ICML2025☆63Updated 5 months ago
- Transactions on Multimedia (TMM25)☆19Updated 10 months ago
- [CVPR 2025] CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient☆108Updated 4 months ago
- [NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing☆140Updated last week
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆186Updated 8 months ago
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆236Updated 5 months ago
- Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward☆60Updated 2 months ago
- [ICCV 2025] Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs☆56Updated last week
- ☆37Updated 7 months ago
- Official implementation of "STAR: Scale-wise Text-to-image generation via Auto-Regressive representations"☆43Updated 10 months ago
- [CVPR 2025] FLAIR: VLM with Fine-grained Language-informed Image Representations☆132Updated 5 months ago
- Learning 1D Causal Visual Representation with De-focus Attention Networks☆35Updated last year
- [NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understanding☆48Updated last year
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆51Updated 7 months ago
- The code repository of UniRL☆51Updated 8 months ago
- [CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training☆101Updated 6 months ago
- Evaluation codes and data for GenEval2☆55Updated last month
- [TIP 2025] Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation☆58Updated last month
- Visual Generation Tuning☆96Updated 2 weeks ago
- Dimple, the first Discrete Diffusion Multimodal Large Language Model☆114Updated 7 months ago
- ☆68Updated 3 months ago
- Code release for Ming-UniVision: Joint Image Understanding and Geneation with a Continuous Unified Tokenizer☆136Updated 3 months ago
- This repository provides the official implementation of VTBench, a benchmark designed to evaluate the performance of visual tokenizers (V…☆34Updated 6 months ago
- Bag of Design Choices for Inference of High-Resolution Masked Generative Transformer☆16Updated last year
- ☆22Updated 2 months ago