[ACL 2025 Oral] 🔥🔥 MegaPairs: Massive Data Synthesis for Universal Multimodal Retrieval
☆246Nov 6, 2025Updated 5 months ago
Alternatives and similar repositories for MegaPairs
Users that are interested in MegaPairs are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning☆77May 23, 2025Updated 11 months ago
- This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR 2025]☆635Updated this week
- official code for "Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval"☆42Jul 4, 2025Updated 9 months ago
- E5-V: Universal Embeddings with Multimodal Large Language Models☆275Dec 10, 2025Updated 4 months ago
- Evaluation code and datasets for the ACL 2024 paper, VISTA: Visualized Text Embedding for Universal Multi-Modal Retrieval. The original c…☆48Nov 16, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant☆181Jul 7, 2025Updated 9 months ago
- ☆24Oct 16, 2025Updated 6 months ago
- [ACM MM 2025] The official code of "Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs"☆105Dec 8, 2025Updated 4 months ago
- Not a neutral survey — a field manual for engineers who build, train, and ship multimodal retrieval at production scale. The C-L-I triang…☆77Apr 20, 2026Updated last week
- Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval [CVPR 2025 Highlight]☆69Jul 8, 2025Updated 9 months ago
- [AAAI 2026 Oral] The official code of "UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning"☆71Dec 8, 2025Updated 4 months ago
- ABC: Achieving Better Control of Multimodal Embeddings using VLMs [TMLR2025]☆21Aug 21, 2025Updated 8 months ago
- Comprehensive benchmark for video text understanding☆28Jun 4, 2025Updated 10 months ago
- Empowering RAG with a versatile model-driven data interface for all-purpose applications!☆17Sep 10, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- [CVPR25] CoLLM: A Large Language Model for Composed Image Retrieval☆28Mar 26, 2025Updated last year
- Official implementation of paper "Reason4Rec: Large Language Models for Recommendation with Deliberative User Preference Alignment"☆44Apr 10, 2025Updated last year
- Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)☆179Oct 1, 2024Updated last year
- Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"☆33Mar 26, 2025Updated last year
- New generation of CLIP with strong fine grained discrimination capability, ICML2025☆749Oct 27, 2025Updated 6 months ago
- ☆39Jan 12, 2026Updated 3 months ago
- [ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text☆419May 5, 2025Updated 11 months ago
- Collection of Composed Image Retrieval (CIR) papers.☆331Updated this week
- ☆12Oct 3, 2023Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- 使用Qwen3的Embedding和Reranker模型实现查找与精排☆21Jun 22, 2025Updated 10 months ago
- GPT4V-level open-source multi-modal model based on Llama3-8B☆2,437Mar 3, 2025Updated last year
- ☆15Aug 28, 2024Updated last year
- LLM2CLIP significantly improves already state-of-the-art CLIP models.☆651Feb 1, 2026Updated 2 months ago
- Solve Visual Understanding with Reinforced VLMs☆5,950Mar 12, 2026Updated last month
- 🔥🔥First-ever hour scale video understanding models☆621Jul 14, 2025Updated 9 months ago
- 🤗 HF Downloader (Hugging Face Downloader) 📦 A user-friendly GUI tool for downloading Hugging Face resources with enhanced connectivity…☆13Jan 5, 2025Updated last year
- ☆59Feb 27, 2025Updated last year
- [NeurIPS 2023] HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception☆44Mar 25, 2024Updated 2 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Official Pytorch implementation of LinCIR: Language-only Training of Zero-shot Composed Image Retrieval (CVPR 2024)☆145Jan 5, 2026Updated 3 months ago
- A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.☆1,447Feb 11, 2026Updated 2 months ago
- [NeurIPS 2025] Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation☆72Oct 17, 2025Updated 6 months ago
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆159Dec 6, 2024Updated last year
- Visual Delta Generator with Large Multi-modal Model for Semi-supervised Composed Image Retrieval - CVPR2024☆21May 30, 2024Updated last year
- AMES: Asymmetric and Memory-Efficient Similarity☆47Aug 12, 2025Updated 8 months ago
- [NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training☆224Mar 20, 2025Updated last year