BIGBALLON / GME-SearchLinks
A multimodal image search engine built on the GME model, capable of handling diverse input types. Whether you're querying with text, images, or both, provides powerful and flexible image retrieval under arbitrary inputs. Perfect for research and demos.
☆40Updated 2 weeks ago
Alternatives and similar repositories for GME-Search
Users that are interested in GME-Search are comparing it to the libraries listed below
Sorting:
- Chinese CLIP models with SOTA performance.☆55Updated last year
- Building a VLM model starts from the basic module.☆16Updated last year
- Research Code for Multimodal-Cognition Team in Ant Group☆153Updated last month
- ☆26Updated 10 months ago
- ☆68Updated last year
- ☆17Updated 2 weeks ago
- Multimodal chatbot with computer vision capabilities integrated, our 1st-gen LMM☆101Updated last year
- ☆181Updated last year
- Large Multimodal Model☆15Updated last year
- ☆57Updated last year
- Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning☆17Updated 4 months ago
- Lion: Kindling Vision Intelligence within Large Language Models☆52Updated last year
- 国内外数据竞赛资讯整理☆18Updated 3 years ago
- Workshop on Foundation Model 1st foundation model challenge Track1 codebase (Open TransMind v1.0)☆18Updated 2 years ago
- ☆15Updated last month
- ☆32Updated 2 years ago
- Facebook Image Similarity Challenge 2021☆19Updated 3 years ago
- This repository contains the dataset, codebase, and benchmarks for our paper: <CNVid-3.5M: Build, Filter, and Pre-train the Large-scale P…☆25Updated last year
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆37Updated 9 months ago
- [arXiv: 2505.17163] OCR-Reasoning Benchmark: Unveiling the True Capabilities of MLLMs in Complex Text-Rich Image Reasoning☆53Updated last month
- [IJCV 2024] TransDETR: End-to-end Video Text Spotting with Transformer☆103Updated last year
- This project aims to collect and collate various datasets for multimodal large model training, including but not limited to pre-training …☆48Updated last month
- TaiSu(太素)--a large-scale Chinese multimodal dataset(亿级大规模中文视觉语言预训练数据集)☆188Updated last year
- ☆14Updated 2 years ago
- ☆66Updated last year
- Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed☆93Updated 8 months ago
- ☆29Updated 10 months ago
- A new video text spotting framework with Transformer☆77Updated 3 years ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 9 months ago
- ☆29Updated 3 years ago