VectorSpaceLab / MegaPairs
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval
☆128Updated last week
Alternatives and similar repositories for MegaPairs:
Users that are interested in MegaPairs are comparing it to the libraries listed below
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent☆278Updated last week
- Research Code for Multimodal-Cognition Team in Ant Group☆138Updated 8 months ago
- Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.☆224Updated last month
- ☆172Updated last month
- This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR25]☆168Updated last week
- ☆78Updated 10 months ago
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆83Updated 2 months ago
- ☆67Updated last year
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆117Updated 4 months ago
- code for piccolo embedding model from SenseTime☆123Updated 10 months ago
- This is a user guide for the MiniCPM and MiniCPM-V series of small language models (SLMs) developed by ModelBest. “面壁小钢炮” focuses on achi…☆223Updated 5 months ago
- 训练一个对中文支持更好的LLaVA模型,并开源训练代码和数据。☆53Updated 6 months ago
- Collect every awesome work about r1!☆306Updated last week
- 🔥🔥First-ever hour scale video understanding models☆259Updated this week
- GOT的vLLM加速实现 并结合 MinerU 实现RAG中的pdf 解析☆51Updated 4 months ago
- The official code for NeurIPS 2024 paper: Harmonizing Visual Text Comprehension and Generation☆116Updated 4 months ago
- ☆340Updated last month
- [NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…☆113Updated 4 months ago
- 1st Solution For Conversational Multi-Doc QA Workshop & International Challenge @ WSDM'24 - Xiaohongshu.Inc☆162Updated last year
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆60Updated 4 months ago
- ☆131Updated last year
- The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆148Updated last week
- official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"☆141Updated 9 months ago
- MMR1: Advancing the Frontiers of Multimodal Reasoning☆145Updated last week
- Multimodal chatbot with computer vision capabilities integrated, our 1st-gen LMM☆100Updated 10 months ago
- 一些大语言模型和多模态模型的应用,主要包括Rag,小模型,Agent,跨模态搜索,OCR等等☆157Updated 4 months ago
- ☆225Updated 10 months ago
- Baichuan-Omni: Towards Capable Open-source Omni-modal LLM 🌊☆265Updated 2 months ago
- ☆56Updated last year
- R1-onevision, a visual language model capable of deep CoT reasoning.☆464Updated last week