QQ-MM / QQMM-embedLinks
☆15Updated last month
Alternatives and similar repositories for QQMM-embed
Users that are interested in QQMM-embed are comparing it to the libraries listed below
Sorting:
- ☆17Updated last month
- ☆69Updated 2 years ago
- Research Code for Multimodal-Cognition Team in Ant Group☆154Updated last week
- Evaluation code and datasets for the ACL 2024 paper, VISTA: Visualized Text Embedding for Universal Multi-Modal Retrieval. The original c…☆40Updated 8 months ago
- ☆42Updated last month
- TaiSu(太素)--a large-scale Chinese multimodal dataset(亿级大规模中文视觉语言预训练数据集)☆189Updated last year
- official code for "Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval"☆25Updated 2 weeks ago
- Lion: Kindling Vision Intelligence within Large Language Models☆52Updated last year
- ☆163Updated last year
- The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.☆40Updated 9 months ago
- Precision Search through Multi-Style Inputs☆71Updated 2 months ago
- ☆70Updated last month
- Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …☆78Updated 8 months ago
- The official implementation of our paper "Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption"☆35Updated last month
- LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning☆62Updated last month
- ☆37Updated last year
- ☆87Updated last year
- Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks☆297Updated last year
- [ACM MM25] The official code of "Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs"☆80Updated last week
- [ACL 2025 Oral] 🔥🔥 MegaPairs: Massive Data Synthesis for Universal Multimodal Retrieval☆204Updated last month
- The official code for NeurIPS 2024 paper: Harmonizing Visual Text Comprehension and Generation☆129Updated 8 months ago
- ☆29Updated 10 months ago
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆92Updated last month
- [ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives☆34Updated 2 weeks ago
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆266Updated last year
- Product1M☆87Updated 2 years ago
- BTS: A Bi-lingual Benchmark for Text Segmentation in the Wild☆30Updated last year
- Narrative movie understanding benchmark☆73Updated last month
- Official repository of MMDU dataset☆92Updated 9 months ago
- Bling's Object detection tool☆56Updated 2 years ago