sugarandgugu / Text2Image-RetrievalLinks
计算机视觉课程设计-基于Chinese-CLIP的图文检索系统
☆89Updated 2 years ago
Alternatives and similar repositories for Text2Image-Retrieval
Users that are interested in Text2Image-Retrieval are comparing it to the libraries listed below
Sorting:
- 该项目旨在通过输入文本描述来检索与之相匹配的图片。☆40Updated last year
- Learning Semantic Relationship among Instances for Image-Text Matching, CVPR, 2023☆90Updated 2 months ago
- 基于多模态检索的互联网图文匹配☆14Updated last year
- 2024.06.19 本项目使用Chinese-CLIP搭建文搜图/图搜图页面,旨在帮助用户快速使用跨模态检索任务。本项目代码针对MUGE数据集约19w(189585张)数据作为底库数据。本项目提供了提取特征, 检索, 以及uI代码。☆17Updated last year
- Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment, CVPR, 2024☆95Updated 2 months ago
- 中文CLIP:自定义数据集,可根据文图提取向量,实现文图匹配。☆22Updated 2 years ago
- Cross-modal few-shot adaptation with CLIP☆343Updated last month
- 【CVPR'2023 Highlight & TPAMI】Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?☆238Updated 6 months ago
- 毕业设计:《基于CLIP模型的视频文本检索设计与实现》☆10Updated 11 months ago
- 模型 llava-Qwen2-7B-Instruct-Chinese-CLIP 增强中文文字识别能力和表情包内涵识别能力,接近gpt4o、claude-3.5-sonnet的识别水平!☆23Updated 11 months ago
- Research Code for Multimodal-Cognition Team in Ant Group☆153Updated last month
- Build a simple basic multimodal large model from scratch. 从零搭建一个简单的基础多模态大模型🤖☆41Updated last year
- ☆48Updated last year
- Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conv…☆454Updated 3 months ago
- [AAAI'24 Oral] LRANet: Towards Accurate and Efficient Scene Text Detection with Low-Rank Approximation Network☆40Updated last year
- WWW2025 Multimodal Intent Recognition for Dialogue Systems Challenge☆120Updated 7 months ago
- A collection of multimodal reasoning papers, codes, datasets, benchmarks and resources.☆246Updated 2 weeks ago
- ☆64Updated 2 months ago
- A project that can generate ancient poems based on pictures, including CLIP, T5, GPT2 models☆22Updated 4 months ago
- 本项目使用LLaVA 1.6多模态模型实现以文搜图和以图搜图功能。☆23Updated last year
- [CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception☆570Updated last year
- GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)☆67Updated last year
- 使用OpenCV+onnxruntime部署中文clip做以文搜图,给出一句话来描述想要的图片,就能从图库中搜出来符合要求的图片。包含C++和Python两个版本的程序☆75Updated last year
- The official implementation of RAR☆88Updated last year
- a super easy clip model with mnist dataset for study☆121Updated last year
- 目标检测,采用yolov8作为基准模型,数据集采用VisDrone2019,带有自己的改进策略☆96Updated 11 months ago
- Building a VLM model starts from the basic module.☆16Updated last year
- 这是一个clip-pytorch的模型,可以训练自己的数据集。☆232Updated 2 years ago
- [ICML 2025] Official repository for paper "Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation"☆156Updated last month
- (AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions☆261Updated last year