BIGBALLON / GME-Search
A multimodal image search engine built on the GME model, capable of handling diverse input types. Whether you're querying with text, images, or both, provides powerful and flexible image retrieval under arbitrary inputs. Perfect for research and demos.
☆23Updated 3 months ago
Alternatives and similar repositories for GME-Search:
Users that are interested in GME-Search are comparing it to the libraries listed below
- Research Code for Multimodal-Cognition Team in Ant Group☆139Updated 8 months ago
- Building a VLM model starts from the basic module.☆14Updated 11 months ago
- ☆21Updated 7 months ago
- Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning☆16Updated last month
- ☆16Updated 3 years ago
- Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed☆82Updated 5 months ago
- This repository contains the dataset, codebase, and benchmarks for our paper: <CNVid-3.5M: Build, Filter, and Pre-train the Large-scale P…☆25Updated last year
- Chinese CLIP models with SOTA performance.☆54Updated last year
- The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.☆38Updated 6 months ago
- Fine-tuning Qwen2.5-VL for vision-language tasks | Optimized for Vision understanding | LoRA & PEFT support.☆33Updated last month
- [IJCV 2024] TransDETR: End-to-end Video Text Spotting with Transformer☆103Updated last year
- This project aims to collect and collate various datasets for multimodal large model training, including but not limited to pre-training …☆34Updated 5 months ago
- Lion: Kindling Vision Intelligence within Large Language Models☆52Updated last year
- Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs☆90Updated 2 months ago
- A new video text spotting framework with Transformer☆77Updated 2 years ago
- [ECCV 2024] SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding☆54Updated 5 months ago
- GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)☆64Updated last year
- MLLM-DataEngine: An Iterative Refinement Approach for MLLM☆46Updated 10 months ago
- Workshop on Foundation Model 1st foundation model challenge Track1 codebase (Open TransMind v1.0)☆18Updated 2 years ago
- ☆67Updated last year
- This repo holds the competitions (information, solutions, summaries, memories) that our team has participated in☆25Updated last year
- Contrast-guided Feature Adjustment Module for Visual Information Extraction☆28Updated last year
- Evaluation code and datasets for the ACL 2024 paper, VISTA: Visualized Text Embedding for Universal Multi-Modal Retrieval. The original c…☆35Updated 4 months ago
- ☆87Updated 9 months ago
- Precision Search through Multi-Style Inputs☆65Updated 8 months ago
- A Dead Simple and Modularized Multi-Modal Training and Finetune Framework. Compatible to any LLaVA/Flamingo/QwenVL/MiniGemini etc series …☆19Updated 11 months ago
- ☆37Updated last year
- [CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge☆135Updated 8 months ago
- [ICCV 2023] ALIP: Adaptive Language-Image Pre-training with Synthetic Caption☆97Updated last year
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆60Updated 5 months ago