penghao-wu / ProxyVLinks
[ICML 2025] Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM
☆19Updated 3 months ago
Alternatives and similar repositories for ProxyV
Users that are interested in ProxyV are comparing it to the libraries listed below
Sorting:
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆106Updated last week
- A framework that allows you to apply Sparse AutoEncoder on any models☆36Updated last month
- Diffusion Powers Video Tokenizer for Comprehension and Generation (CVPR 2025)☆74Updated 6 months ago
- UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation☆42Updated last month
- Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision☆96Updated 2 weeks ago
- Official Implementation of Paper Transfer between Modalities with MetaQueries☆219Updated last month
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning☆74Updated last month
- Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"☆120Updated last month
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆83Updated 5 months ago
- Official respository for ReasonGen-R1☆64Updated 2 months ago
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆155Updated 3 months ago
- A collection of vision foundation models unifying understanding and generation.☆57Updated 7 months ago
- Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization☆21Updated 4 months ago
- [CVPR 25] A framework named B^2-DiffuRL for RL-based diffusion model fine-tuning.☆34Updated 4 months ago
- Structured Video Comprehension of Real-World Shorts☆177Updated 3 weeks ago
- ☆39Updated this week
- ICML2025☆54Updated 2 weeks ago
- [CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis☆59Updated 4 months ago
- Official Implementation of VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention☆39Updated 4 months ago
- [ICLR'25] Reconstructive Visual Instruction Tuning☆103Updated 4 months ago
- [CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmenta…☆49Updated 2 months ago
- FQGAN: Factorized Visual Tokenization and Generation☆52Updated 4 months ago
- [ECCV 2024] M3DBench introduces a comprehensive 3D instruction-following dataset with support for interleaved multi-modal prompts.☆61Updated 10 months ago
- VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models☆56Updated 2 months ago
- Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025☆67Updated 5 months ago
- [NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Atten…☆55Updated last month
- SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding☆55Updated last month
- ☆30Updated 8 months ago
- Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing☆84Updated last week
- [ICLR 2024] Official implementation of the paper "Toss: High-quality text-guided novel view synthesis from a single image"☆22Updated last year