penghao-wu / ProxyVLinks
[ICML 2025] Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM
☆19Updated 3 months ago
Alternatives and similar repositories for ProxyV
Users that are interested in ProxyV are comparing it to the libraries listed below
Sorting:
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆117Updated 3 weeks ago
- Diffusion Powers Video Tokenizer for Comprehension and Generation (CVPR 2025)☆81Updated 6 months ago
- A collection of vision foundation models unifying understanding and generation.☆57Updated 8 months ago
- UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation☆45Updated 3 weeks ago
- Official respository for ReasonGen-R1☆69Updated 2 months ago
- ICML2025☆57Updated 3 weeks ago
- Official Implementation of Paper Transfer between Modalities with MetaQueries☆233Updated 2 months ago
- [ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness☆55Updated last month
- [CVPR 25] A framework named B^2-DiffuRL for RL-based diffusion model fine-tuning.☆37Updated 5 months ago
- The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"☆33Updated 3 weeks ago
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning☆79Updated 2 months ago
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆162Updated 3 months ago
- A framework that allows you to apply Sparse AutoEncoder on any models☆40Updated 2 months ago
- [CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis☆60Updated 4 months ago
- Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"☆127Updated last month
- Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization☆23Updated 5 months ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆87Updated 6 months ago
- ☆50Updated 3 weeks ago
- ☆43Updated last month
- VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models☆58Updated 3 months ago
- Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs☆48Updated last month
- [ICLR'25] Reconstructive Visual Instruction Tuning☆116Updated 5 months ago
- [CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmenta…☆53Updated 2 months ago
- This repository provides the official implementation of VTBench, a benchmark designed to evaluate the performance of visual tokenizers (V…☆34Updated last month
- Official Implementation of VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention☆39Updated 5 months ago
- A list of works on video generation towards world model☆165Updated last month
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆73Updated 2 months ago
- [ICLR 2024] Official implementation of the paper "Toss: High-quality text-guided novel view synthesis from a single image"☆22Updated last year
- The official repository of "Sekai: A Video Dataset towards World Exploration"☆153Updated 2 months ago
- Structured Video Comprehension of Real-World Shorts☆193Updated this week