xiaoxing2001 / DeGLALinks
[ACM MM25] Official Pytorch implementation of [Decoupled Global-Local Alignment for Improving Compositional Understanding]
☆13Updated 2 months ago
Alternatives and similar repositories for DeGLA
Users that are interested in DeGLA are comparing it to the libraries listed below
Sorting:
- [ICCV 2025] Dynamic-VLM☆25Updated 9 months ago
- [EMNLP25 Main]The official code of "Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval"☆14Updated last week
- Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"☆31Updated 5 months ago
- [ACM MM25] The official code of "Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs"☆87Updated last month
- SFT+RL boosts multimodal reasoning☆30Updated 2 months ago
- ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs☆25Updated last month
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆29Updated this week
- [ACM MM2025] The official repository for the RealSyn dataset☆37Updated 2 months ago
- LMM solved catastrophic forgetting, AAAI2025☆44Updated 5 months ago
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆58Updated 10 months ago
- [ECCV 2024] FlexAttention for Efficient High-Resolution Vision-Language Models☆43Updated 8 months ago
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆57Updated 9 months ago
- ☆32Updated 5 months ago
- \infty-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation☆16Updated 7 months ago
- Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding☆36Updated 6 months ago
- ☆19Updated 4 months ago
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆20Updated 11 months ago
- (ICCV2025) Official repository of paper "ViSpeak: Visual Instruction Feedback in Streaming Videos"☆40Updated 2 months ago
- Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning☆37Updated 2 months ago
- SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward☆78Updated last month
- MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆41Updated 5 months ago
- [MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501☆56Updated last year
- On Path to Multimodal Generalist: General-Level and General-Bench☆19Updated 2 months ago
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆24Updated 11 months ago
- Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …☆82Updated 10 months ago
- iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models☆19Updated 7 months ago
- Official implement of MIA-DPO☆65Updated 7 months ago
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆57Updated 11 months ago
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆28Updated 8 months ago