BIGBALLON / GME-Search
A multimodal image search engine built on the GME model, capable of handling diverse input types. Whether you're querying with text, images, or both, provides powerful and flexible image retrieval under arbitrary inputs. Perfect for research and demos.
☆35Updated 4 months ago
Alternatives and similar repositories for GME-Search
Users that are interested in GME-Search are comparing it to the libraries listed below
Sorting:
- Building a VLM model starts from the basic module.☆16Updated last year
- ☆24Updated 9 months ago
- Research Code for Multimodal-Cognition Team in Ant Group☆143Updated 10 months ago
- Chinese CLIP models with SOTA performance.☆55Updated last year
- This repository contains the dataset, codebase, and benchmarks for our paper: <CNVid-3.5M: Build, Filter, and Pre-train the Large-scale P…☆25Updated last year
- Large Multimodal Model☆15Updated last year
- ☆67Updated last year
- This project aims to collect and collate various datasets for multimodal large model training, including but not limited to pre-training …☆43Updated last week
- The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.☆39Updated 7 months ago
- Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning☆16Updated 2 months ago
- OpenCompatible provides a standard compatible training benchmark, covering practical training scenarios.☆25Updated 2 years ago
- ☆16Updated 3 years ago
- Fine-tuning Qwen2.5-VL for vision-language tasks | Optimized for Vision understanding | LoRA & PEFT support.☆64Updated 3 months ago
- 【ArXiv】PDF-Wukong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling☆116Updated 6 months ago
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆60Updated 6 months ago
- The official implementation of RAR☆87Updated last year
- ☆36Updated 10 months ago
- ☆56Updated last year
- ☆87Updated 10 months ago
- ☆177Updated last year
- New generation of CLIP with fine grained discrimination capability, ICML2025☆89Updated this week
- TaiSu(太素)--a large-scale Chinese multimodal dataset(亿级大规模中文视觉语言预训练数据集)☆181Updated last year
- TransVCL: Attention-enhanced Video Copy Localization Network with Flexible Supervision [AAAI2023 Oral]]☆54Updated 2 years ago
- A new video text spotting framework with Transformer☆77Updated 2 years ago
- Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs☆90Updated 4 months ago
- MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval☆167Updated this week
- HHH☆34Updated 3 years ago
- A Survey of Multimodal Retrieval-Augmented Generation☆18Updated last month
- MLLM-DataEngine: An Iterative Refinement Approach for MLLM☆46Updated 11 months ago
- [IJCV 2024] TransDETR: End-to-end Video Text Spotting with Transformer☆103Updated last year