BIGBALLON / GME-SearchLinks

A multimodal image search engine built on the GME model, capable of handling diverse input types. Whether you're querying with text, images, or both, provides powerful and flexible image retrieval under arbitrary inputs. Perfect for research and demos.

☆42

Alternatives and similar repositories for GME-Search

Users that are interested in GME-Search are comparing it to the libraries listed below

Sorting:

alipay / Ant-Multi-Modal-Framework
Research Code for Multimodal-Cognition Team in Ant Group
☆154Updated last week
zhangfaen / finetune-InternVL2
☆28Updated 11 months ago
WatchTower-Liu / VLM-learning
Building a VLM model starts from the basic module.
☆16Updated last year
TencentARC-QQ / QA-CLIP
Chinese CLIP models with SOTA performance.
☆55Updated last year
thu-ml / zh-clip
☆69Updated 2 years ago
large-ocr-model / large-ocr-model.github.io
☆181Updated last year
Ucas-HaoranWei / Vary-family
☆57Updated last year
360CVGroup / SEEChat
Multimodal chatbot with computer vision capabilities integrated, our 1st-gen LMM
☆101Updated last year
raghavlite / B3
☆17Updated last month
VectorSpaceLab / MegaPairs
[ACL 2025 Oral] 🔥🔥 MegaPairs: Massive Data Synthesis for Universal Multimodal Retrieval
☆204Updated last month
OpenGVLab / InternVL-MMDetSeg
Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed
☆94Updated 8 months ago
ksOAn6g5 / TaiSu
TaiSu（太素）--a large-scale Chinese multimodal dataset（亿级大规模中文视觉语言预训练数据集）
☆189Updated last year
PCIResearch / TransCore-M
Large Multimodal Model
☆15Updated last year
adxcreative / EERCF
Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning
☆18Updated 4 months ago
MUGE-2021 / image-caption-baseline
☆66Updated last year
weijiawu / TransVTSpotter
A new video text spotting framework with Transformer
☆77Updated 3 years ago
sandy1990418 / Finetune-Qwen2.5-VL
Fine-tuning Qwen2.5-VL for vision-language tasks | Optimized for Vision understanding | LoRA & PEFT support.
☆100Updated 5 months ago
friedrichor / UNITE
official code for "Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval"
☆25Updated last week
QQ-MM / QQMM-embed
☆15Updated last month
Ucas-HaoranWei / Vary-tiny-600k
Vary-tiny codebase upon LAVIS （for training from scratch）and a PDF image-text pairs data (about 600k including English/Chinese)
☆84Updated 9 months ago
mynameischaos / Lion
Lion: Kindling Vision Intelligence within Large Language Models
☆52Updated last year
ShowMeAI-Hub / image_retrieval
image retrieval systems based on CNN feature distance and triplet loss
☆31Updated 3 years ago
WePOINTS / WePOINTS
☆173Updated 5 months ago
scenarios / WeMM
☆87Updated last year
MUGE-2021 / image-retrieval-baseline
☆59Updated 2 years ago
zhanxlin / Product1M
Product1M
☆87Updated 2 years ago
MAEHCM / ICL-D3IE
Code for ICCV 2023 Paper : “ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction”
☆53Updated last year
yuxie11 / R2D2
☆163Updated last year
jinbo0906 / Awesome-MLLM-Datasets
This project aims to collect and collate various datasets for multimodal large model training, including but not limited to pre-training …
☆49Updated 2 months ago
yh-hust / PDF-Wukong
【ArXiv】PDF-Wukong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling
☆121Updated last month