friedrichor / UNITELinks
official code for "Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval"
☆25Updated 2 weeks ago
Alternatives and similar repositories for UNITE
Users that are interested in UNITE are comparing it to the libraries listed below
Sorting:
- ☆87Updated last year
- Research Code for Multimodal-Cognition Team in Ant Group☆154Updated last week
- Lion: Kindling Vision Intelligence within Large Language Models☆52Updated last year
- A lightweight flexible Video-MLLM developed by TencentQQ Multimedia Research Team.☆72Updated 9 months ago
- ☆37Updated last year
- Official implementation of paper AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding☆74Updated 2 months ago
- Precision Search through Multi-Style Inputs☆71Updated 2 months ago
- The official implementation of our paper "Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption"☆35Updated last month
- ☆69Updated 2 years ago
- LMM solved catastrophic forgetting, AAAI2025☆44Updated 3 months ago
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆26Updated last year
- [ACM MM25] The official code of "Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs"☆80Updated last week
- LAVIS - A One-stop Library for Language-Vision Intelligence☆48Updated 11 months ago
- ☆115Updated 11 months ago
- ☆85Updated last year
- A Simple Framework of Small-scale LMMs for Video Understanding☆72Updated last month
- WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning☆28Updated last month
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆58Updated 9 months ago
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆62Updated 8 months ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 9 months ago
- ☆42Updated last month
- [ICCV2025] A Token-level Text Image Foundation Model for Document Understanding☆109Updated 2 weeks ago
- Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions (NeurIPS 2024)☆164Updated 11 months ago
- ☆29Updated 10 months ago
- Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning☆94Updated 2 weeks ago
- 【NeurIPS 2024】Dense Connector for MLLMs☆171Updated 9 months ago
- Repository for 23'MM accepted paper "Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Groundi…☆49Updated last year
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆92Updated last month
- Narrative movie understanding benchmark☆73Updated last month
- [ACM MM2025] The official repository for the RealSyn dataset☆35Updated last week