XMUDeepLIT / UME-R1Links
☆36Updated 3 weeks ago
Alternatives and similar repositories for UME-R1
Users that are interested in UME-R1 are comparing it to the libraries listed below
Sorting:
- [ACM MM 2025] The official code of "Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs"☆97Updated last month
- LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning☆75Updated 7 months ago
- Code for DeCo: Decoupling token compression from semanchc abstraction in multimodal large language models☆75Updated 5 months ago
- GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)☆72Updated 2 years ago
- [Paper][AAAI2024]Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations☆152Updated last year
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆85Updated last year
- [CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge☆153Updated 4 months ago
- ☆56Updated last month
- MoCLE (First MLLM with MoE for instruction customization and generalization!) (https://arxiv.org/abs/2312.12379)☆45Updated 6 months ago
- ☆25Updated last year
- The official implementation of RAR☆92Updated last month
- [ICML 2024] Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning☆50Updated last year
- [CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"☆145Updated last year
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU☆50Updated 5 months ago
- [NeurIPS2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model☆89Updated 2 years ago
- Official repository of MMDU dataset☆102Updated last year
- ☆21Updated last year
- [CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension☆60Updated last year
- [IEEE TMM 2025 & ACL 2024 Findings] LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition☆36Updated 5 months ago
- [ICLR 2023] This is the code repo for our ICLR‘23 paper "Universal Vision-Language Dense Retrieval: Learning A Unified Representation Spa…☆53Updated last year
- ☆87Updated last year
- ☆37Updated last year
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆59Updated last year
- Evaluation code and datasets for the ACL 2024 paper, VISTA: Visualized Text Embedding for Universal Multi-Modal Retrieval. The original c…☆46Updated last year
- Offical Repository of "AtomThink: Multimodal Slow Thinking with Atomic Step Reasoning"☆58Updated last month
- [ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling☆143Updated 4 months ago
- Visual Instruction Tuning for Qwen2 Base Model☆41Updated last year
- [ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"☆53Updated 2 years ago
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆77Updated last year
- All-In-One VLM: Image + Video + Transfer to Other Languages / Domains (TPAMI 2023)☆166Updated last year