VectorSpaceLab / MegaPairsLinks
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval
☆187Updated last month
Alternatives and similar repositories for MegaPairs
Users that are interested in MegaPairs are comparing it to the libraries listed below
Sorting:
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent☆339Updated 2 months ago
- Research Code for Multimodal-Cognition Team in Ant Group☆151Updated last month
- ☆173Updated 4 months ago
- Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.☆242Updated 3 months ago
- Repo for "VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforce…☆236Updated last week
- This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR25]☆267Updated this week
- MMR1: Advancing the Frontiers of Multimodal Reasoning☆159Updated 3 months ago
- [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback☆281Updated 9 months ago
- ☆504Updated this week
- The official code of "Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs"☆75Updated last month
- GOT的vLLM加速实现 并结合 MinerU 实现RAG中的pdf 解析☆58Updated 7 months ago
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆91Updated 3 weeks ago
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆126Updated 7 months ago
- ☆363Updated 4 months ago
- 训练一个对中文支持更好的LLaVA模型,并开源训练代码和数据。☆61Updated 9 months ago
- A Survey of Multimodal Retrieval-Augmented Generation☆18Updated 2 months ago
- R1-onevision, a visual language model capable of deep CoT reasoning.☆528Updated 2 months ago
- [CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness☆378Updated last month
- DeepSpeed教程 & 示例注释 & 学习笔记 (大模型高效训练)☆167Updated last year
- Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …☆78Updated 7 months ago
- A Token-level Text Image Foundation Model for Document Understanding☆96Updated last month
- MM-Eureka V0 also called R1-Multimodal-Journey, Latest version is in MM-Eureka☆307Updated last month
- Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources☆226Updated last month
- ✨✨R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning☆149Updated last month
- Collect every awesome work about r1!☆386Updated last month
- A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision,…☆301Updated 3 months ago
- ✨First Open-Source R1-like Video-LLM [2025/02/18]☆348Updated 4 months ago
- This is a user guide for the MiniCPM and MiniCPM-V series of small language models (SLMs) developed by ModelBest. “面壁小钢炮” focuses on achi…☆247Updated 7 months ago
- Official repository of MMDU dataset☆92Updated 8 months ago
- [ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text☆365Updated last month