[ACL 2025 Oral] 🔥🔥 MegaPairs: Massive Data Synthesis for Universal Multimodal Retrieval
☆244Nov 6, 2025Updated 5 months ago
Alternatives and similar repositories for MegaPairs
Users that are interested in MegaPairs are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning☆77May 23, 2025Updated 10 months ago
- This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR 2025]☆619Mar 28, 2026Updated last week
- official code for "Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval"☆42Jul 4, 2025Updated 9 months ago
- E5-V: Universal Embeddings with Multimodal Large Language Models☆274Dec 10, 2025Updated 4 months ago
- Evaluation code and datasets for the ACL 2024 paper, VISTA: Visualized Text Embedding for Universal Multi-Modal Retrieval. The original c…☆47Nov 16, 2024Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆24Oct 16, 2025Updated 5 months ago
- [CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant☆179Jul 7, 2025Updated 9 months ago
- Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval [CVPR 2025 Highlight]☆67Jul 8, 2025Updated 9 months ago
- Toward Universal Multimodal Embedding☆76Aug 1, 2025Updated 8 months ago
- [AAAI 2026 Oral] The official code of "UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning"☆70Dec 8, 2025Updated 4 months ago
- ABC: Achieving Better Control of Multimodal Embeddings using VLMs [TMLR2025]☆21Aug 21, 2025Updated 7 months ago
- Comprehensive benchmark for video text understanding☆28Jun 4, 2025Updated 10 months ago
- Empowering RAG with a versatile model-driven data interface for all-purpose applications!☆17Sep 10, 2024Updated last year
- New generation of CLIP with strong fine grained discrimination capability, ICML2025☆563Oct 27, 2025Updated 5 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- [CVPR25] CoLLM: A Large Language Model for Composed Image Retrieval☆28Mar 26, 2025Updated last year
- Official implementation of paper "Reason4Rec: Large Language Models for Recommendation with Deliberative User Preference Alignment"☆44Apr 10, 2025Updated last year
- Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)☆179Oct 1, 2024Updated last year
- Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"☆33Mar 26, 2025Updated last year
- Retrieval and Retrieval-augmented LLMs☆11,502Apr 1, 2026Updated last week
- ☆39Jan 12, 2026Updated 2 months ago
- [ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text☆415May 5, 2025Updated 11 months ago
- 使用Qwen3的Embedding和Reranker模型实现查找与精排☆21Jun 22, 2025Updated 9 months ago
- GPT4V-level open-source multi-modal model based on Llama3-8B☆2,438Mar 3, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- ☆15Aug 28, 2024Updated last year
- Solve Visual Understanding with Reinforced VLMs☆5,925Mar 12, 2026Updated 3 weeks ago
- 🔥🔥First-ever hour scale video understanding models☆620Jul 14, 2025Updated 8 months ago
- 🤗 HF Downloader (Hugging Face Downloader) 📦 A user-friendly GUI tool for downloading Hugging Face resources with enhanced connectivity…☆13Jan 5, 2025Updated last year
- ☆59Feb 27, 2025Updated last year
- mllm-npu: training multimodal large language models on Ascend NPUs☆94Aug 29, 2024Updated last year
- Official Pytorch implementation of LinCIR: Language-only Training of Zero-shot Composed Image Retrieval (CVPR 2024)☆145Jan 5, 2026Updated 3 months ago
- [NeurIPS 2023] HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception☆44Mar 25, 2024Updated 2 years ago
- A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.☆1,444Feb 11, 2026Updated last month
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- [NeurIPS 2025] Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation☆71Oct 17, 2025Updated 5 months ago
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆159Dec 6, 2024Updated last year
- Visual Delta Generator with Large Multi-modal Model for Semi-supervised Composed Image Retrieval - CVPR2024☆21May 30, 2024Updated last year
- AMES: Asymmetric and Memory-Efficient Similarity☆46Aug 12, 2025Updated 7 months ago
- [NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training☆224Mar 20, 2025Updated last year
- 基于虚拟仿真环境下的自动驾驶交通标志识别第三名方案☆10Jan 11, 2020Updated 6 years ago
- ☆13Nov 26, 2021Updated 4 years ago