deepglint / UniME
The official code of "Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs"
☆69Updated last week
Alternatives and similar repositories for UniME
Users that are interested in UniME are comparing it to the libraries listed below
Sorting:
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆100Updated 2 months ago
- Official repository of MMDU dataset☆90Updated 7 months ago
- ☆115Updated 9 months ago
- ☆56Updated last month
- ☆85Updated last year
- SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models☆107Updated 3 weeks ago
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆57Updated 7 months ago
- The Next Step Forward in Multimodal LLM Alignment☆154Updated 2 weeks ago
- ✨✨R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning☆109Updated last week
- This project aims to collect and collate various datasets for multimodal large model training, including but not limited to pre-training …☆43Updated last week
- ✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?☆118Updated 2 months ago
- New generation of CLIP with fine grained discrimination capability, ICML2025☆89Updated this week
- ☆73Updated last year
- 【NeurIPS 2024】Dense Connector for MLLMs☆163Updated 7 months ago
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆60Updated 6 months ago
- Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …☆75Updated 6 months ago
- Official implement of MIA-DPO☆57Updated 3 months ago
- ☆75Updated 4 months ago
- The official repository for the RealSyn dataset☆32Updated 2 weeks ago
- ☆36Updated 10 months ago
- ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆34Updated 4 months ago
- The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.☆39Updated 7 months ago
- ☆74Updated 6 months ago
- [ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation☆157Updated last month
- TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning☆65Updated last week
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 7 months ago
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆56Updated 10 months ago
- A collection of visual instruction tuning datasets.☆76Updated last year
- MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆34Updated last month
- Pruning the VLLMs☆92Updated 5 months ago