Longin-Yu / ComRoPELinks
☆12Updated last month
Alternatives and similar repositories for ComRoPE
Users that are interested in ComRoPE are comparing it to the libraries listed below
Sorting:
- \infty-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation☆15Updated 5 months ago
- Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better☆36Updated last month
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆26Updated 3 months ago
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆24Updated 9 months ago
- ☆16Updated 2 months ago
- ☆12Updated 6 months ago
- Official implementation of Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More☆23Updated 5 months ago
- [CVPR 2025] DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval☆19Updated last month
- Official implementation for the paper"Towards Understanding How Knowledge Evolves in Large Vision-Language Models"☆17Updated 4 months ago
- MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision☆24Updated 2 months ago
- [EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality☆17Updated 10 months ago
- ☆12Updated 4 months ago
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆26Updated 7 months ago
- [NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models☆69Updated 3 months ago
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆57Updated 9 months ago
- Official PyTorch implementation of RACRO (https://www.arxiv.org/abs/2506.04559)☆17Updated last month
- Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’☆52Updated last month
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding☆65Updated last month
- Official code for paper "GRIT: Teaching MLLMs to Think with Images"☆115Updated this week
- CLIP-MoE: Mixture of Experts for CLIP☆42Updated 10 months ago
- [CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆180Updated last month
- [ICCV25 Oral] Token Activation Map to Visually Explain Multimodal LLMs☆51Updated 2 weeks ago
- (ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.☆27Updated this week
- [ICCV 2025] Dynamic-VLM☆23Updated 7 months ago
- [ICCV 2025] ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models☆26Updated last month
- ☆93Updated 4 months ago
- ☆23Updated 4 months ago
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆45Updated last month
- 🚀 Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models☆28Updated last month
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆54Updated 2 weeks ago