xiaoxing2001 / DeGLALinks
[ACM MM25] Official Pytorch implementation of [Decoupled Global-Local Alignment for Improving Compositional Understanding]
☆13Updated last month
Alternatives and similar repositories for DeGLA
Users that are interested in DeGLA are comparing it to the libraries listed below
Sorting:
- [ICCV 2025] Dynamic-VLM☆23Updated 8 months ago
- [ACM MM25] The official code of "Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs"☆83Updated last week
- Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"☆30Updated 4 months ago
- SFT+RL boosts multimodal reasoning☆25Updated last month
- Code and data for paper "Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation".☆17Updated 3 months ago
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆28Updated 3 months ago
- [ACM MM2025] The official repository for the RealSyn dataset☆37Updated last month
- \infty-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation☆16Updated 6 months ago
- ☆16Updated 3 months ago
- MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆42Updated 4 months ago
- 🚀 Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models☆31Updated 3 weeks ago
- ☆32Updated 4 months ago
- Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding☆36Updated 5 months ago
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆24Updated 10 months ago
- Unsupervised GRPO☆41Updated 2 months ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 10 months ago
- Official implementation of "Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology"☆49Updated last month
- Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …☆79Updated 9 months ago
- [MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501☆56Updated last year
- (ICCV2025) Official repository of paper "ViSpeak: Visual Instruction Feedback in Streaming Videos"☆37Updated last month
- LMM solved catastrophic forgetting, AAAI2025☆44Updated 4 months ago
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆55Updated 8 months ago
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆20Updated 9 months ago
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆57Updated 10 months ago
- ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs☆25Updated this week
- ☆73Updated last year
- ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆48Updated 7 months ago
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]☆19Updated 5 months ago
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆26Updated 7 months ago
- Official implement of MIA-DPO☆64Updated 6 months ago