BIGBALLON / GME-SearchLinks
A multimodal image search engine built on the GME model, capable of handling diverse input types. Whether you're querying with text, images, or both, provides powerful and flexible image retrieval under arbitrary inputs. Perfect for research and demos.
☆38Updated 5 months ago
Alternatives and similar repositories for GME-Search
Users that are interested in GME-Search are comparing it to the libraries listed below
Sorting:
- Building a VLM model starts from the basic module.☆16Updated last year
- ☆25Updated 9 months ago
- Research Code for Multimodal-Cognition Team in Ant Group☆147Updated 2 weeks ago
- Chinese CLIP models with SOTA performance.☆55Updated last year
- ☆68Updated last year
- ☆15Updated last week
- ☆56Updated last year
- This project aims to collect and collate various datasets for multimodal large model training, including but not limited to pre-training …☆47Updated last month
- Our 2nd-gen LMM☆33Updated last year
- Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning☆16Updated 3 months ago
- Large Multimodal Model☆15Updated last year
- Workshop on Foundation Model 1st foundation model challenge Track1 codebase (Open TransMind v1.0)☆18Updated 2 years ago
- ☆179Updated last year
- The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.☆40Updated 8 months ago
- Multimodal chatbot with computer vision capabilities integrated, our 1st-gen LMM☆101Updated last year
- 从零到一实现了一个多模态大模型,并命名为Reyes(睿视),R:睿,eyes:眼。Reyes的参数量为8B,视觉编码器使用的是InternViT-300M-448px-V2_5,语言模型侧使用的是Qwen2.5-7B-Instruct,Reyes也通过一个两层MLP投影层连…☆13Updated 3 months ago
- Fine-tuning Qwen2.5-VL for vision-language tasks | Optimized for Vision understanding | LoRA & PEFT support.☆78Updated 3 months ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 8 months ago
- MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval☆183Updated 2 weeks ago
- ☆13Updated 2 weeks ago
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆37Updated 8 months ago
- [IJCV 2024] TransDETR: End-to-end Video Text Spotting with Transformer☆103Updated last year
- ChineseCLIP using online learning☆13Updated 2 years ago
- ☆32Updated 2 years ago
- MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. A comprehensive evaluation of multimodal large model multilingua…☆59Updated 3 weeks ago
- A Token-level Text Image Foundation Model for Document Understanding☆92Updated last month
- Vary-tiny codebase upon LAVIS (for training from scratch)and a PDF image-text pairs data (about 600k including English/Chinese)☆83Updated 8 months ago
- Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed☆91Updated 7 months ago
- 训练一个对中文支持更好的LLaVA模型,并开源训练代码和数据。☆60Updated 9 months ago
- ☆16Updated 3 years ago