penghao-wu / ProxyVLinks
[ICML 2025] Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM
☆20Updated 6 months ago
Alternatives and similar repositories for ProxyV
Users that are interested in ProxyV are comparing it to the libraries listed below
Sorting:
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆127Updated 3 months ago
- Diffusion Powers Video Tokenizer for Comprehension and Generation (CVPR 2025)☆84Updated 8 months ago
- The official implementation of paper “VChain: Chain-of-Visual-Thought for Reasoning in Video Generation”☆101Updated last month
- Visual Spatial Tuning☆133Updated this week
- Official repository for the UAE paper, unified-GRPO, and unified-Bench☆147Updated 2 months ago
- [NeurIPS 2025] VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models☆115Updated 2 weeks ago
- Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"☆157Updated last month
- Official Implementation of Paper Transfer between Modalities with MetaQueries☆269Updated last month
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆93Updated 8 months ago
- ☆58Updated 2 weeks ago
- UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation☆46Updated 2 months ago
- ☆60Updated 3 months ago
- Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs☆52Updated 4 months ago
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆181Updated last month
- [Neurips 2025 NextVid Workshop Oral✨] Official Implementation of VideoGen-of-Thought: Step-by-step generating multi-shot video with minim…☆50Updated 2 months ago
- ☆51Updated 3 months ago
- Official repository of PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning☆52Updated last month
- Official respository for ReasonGen-R1☆73Updated 4 months ago
- This repository provides the official implementation of VTBench, a benchmark designed to evaluate the performance of visual tokenizers (V…☆34Updated 3 months ago
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆177Updated 6 months ago
- The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"☆64Updated last month
- [CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis☆62Updated 6 months ago
- [NeurIPS 2025] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations☆182Updated 2 months ago
- This is an early exploration to introduce Interleaving Reasoning to Text-to-image Generation field and achieve the SoTA benchmark perform…☆74Updated 2 months ago
- [ECCV 2024] M3DBench introduces a comprehensive 3D instruction-following dataset with support for interleaved multi-modal prompts.☆61Updated last year
- ICML2025☆60Updated 2 months ago
- [CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmenta…☆60Updated 4 months ago
- A framework that allows you to apply Sparse AutoEncoder on any models☆44Updated 4 months ago
- A collection of vision foundation models unifying understanding and generation.☆58Updated 10 months ago
- The official repository of our paper "Reinforcing Video Reasoning with Focused Thinking"☆28Updated 5 months ago