friedrichor / UNITELinks
official code for "Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval"
☆30Updated last month
Alternatives and similar repositories for UNITE
Users that are interested in UNITE are comparing it to the libraries listed below
Sorting:
- ☆37Updated last year
- Lion: Kindling Vision Intelligence within Large Language Models☆52Updated last year
- ☆87Updated last year
- [CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge☆150Updated last year
- WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning☆33Updated last month
- Research Code for Multimodal-Cognition Team in Ant Group☆161Updated last month
- 【NeurIPS 2024】Dense Connector for MLLMs☆171Updated 9 months ago
- LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning☆63Updated 2 months ago
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆93Updated 2 months ago
- A lightweight flexible Video-MLLM developed by TencentQQ Multimedia Research Team.☆73Updated 9 months ago
- ☆66Updated last year
- Official repository of MMDU dataset☆93Updated 10 months ago
- ☆119Updated last year
- Narrative movie understanding benchmark☆73Updated last month
- ☆76Updated 8 months ago
- Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"☆30Updated 4 months ago
- [ACM MM25] The official code of "Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs"☆83Updated last month
- ☆91Updated last year
- [ICCV'25] Explore the Limits of Omni-modal Pretraining at Scale☆114Updated 11 months ago
- LAVIS - A One-stop Library for Language-Vision Intelligence☆48Updated last year
- Evaluation code and datasets for the ACL 2024 paper, VISTA: Visualized Text Embedding for Universal Multi-Modal Retrieval. The original c…☆41Updated 8 months ago
- ☆133Updated last year
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆57Updated 10 months ago
- Official implementation of paper AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding☆77Updated 3 months ago
- A Simple Framework of Small-scale LMMs for Video Understanding☆73Updated last month
- ☆86Updated last year
- MLLM-DataEngine: An Iterative Refinement Approach for MLLM☆46Updated last year
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆267Updated last year
- A collection of visual instruction tuning datasets.☆76Updated last year
- Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …☆79Updated 8 months ago