BIGBALLON / GME-SearchLinks
A multimodal image search engine built on the GME model, capable of handling diverse input types. Whether you're querying with text, images, or both, provides powerful and flexible image retrieval under arbitrary inputs. Perfect for research and demos.
☆42Updated last month
Alternatives and similar repositories for GME-Search
Users that are interested in GME-Search are comparing it to the libraries listed below
Sorting:
- Research Code for Multimodal-Cognition Team in Ant Group☆154Updated last week
- ☆28Updated 11 months ago
- Building a VLM model starts from the basic module.☆16Updated last year
- Chinese CLIP models with SOTA performance.☆55Updated last year
- ☆69Updated 2 years ago
- ☆181Updated last year
- ☆57Updated last year
- Multimodal chatbot with computer vision capabilities integrated, our 1st-gen LMM☆101Updated last year
- ☆17Updated last month
- [ACL 2025 Oral] 🔥🔥 MegaPairs: Massive Data Synthesis for Universal Multimodal Retrieval☆204Updated last month
- Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed☆94Updated 8 months ago
- TaiSu(太素)--a large-scale Chinese multimodal dataset(亿级大规模中文视觉语言预训练数据集)☆189Updated last year
- Large Multimodal Model☆15Updated last year
- Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning☆18Updated 4 months ago
- ☆66Updated last year
- A new video text spotting framework with Transformer☆77Updated 3 years ago
- Fine-tuning Qwen2.5-VL for vision-language tasks | Optimized for Vision understanding | LoRA & PEFT support.☆100Updated 5 months ago
- official code for "Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval"☆25Updated last week
- ☆15Updated last month
- Vary-tiny codebase upon LAVIS (for training from scratch)and a PDF image-text pairs data (about 600k including English/Chinese)☆84Updated 9 months ago
- Lion: Kindling Vision Intelligence within Large Language Models☆52Updated last year
- image retrieval systems based on CNN feature distance and triplet loss☆31Updated 3 years ago
- ☆173Updated 5 months ago
- ☆87Updated last year
- ☆59Updated 2 years ago
- Product1M☆87Updated 2 years ago
- Code for ICCV 2023 Paper : “ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction”☆53Updated last year
- ☆163Updated last year
- This project aims to collect and collate various datasets for multimodal large model training, including but not limited to pre-training …☆49Updated 2 months ago
- 【ArXiv】PDF-Wukong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling☆121Updated last month