ZYiJie / Simple-CLIP
基于开源预训练模型来实现一个简单的CLIP模型
☆24Updated 2 years ago
Alternatives and similar repositories for Simple-CLIP
Users that are interested in Simple-CLIP are comparing it to the libraries listed below
Sorting:
- Image Retrieval☆29Updated 3 years ago
- New generation of CLIP with fine grained discrimination capability, ICML2025☆89Updated this week
- Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning☆16Updated 2 months ago
- Research Code for Multimodal-Cognition Team in Ant Group☆144Updated this week
- The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoi…☆92Updated 2 years ago
- 基于ClipCap的看图说话Image Caption模型☆302Updated 3 years ago
- 多模态 MM +Chat 合集☆262Updated 2 weeks ago
- Turning a CLIP Model into a Scene Text Detector (CVPR2023) | Turning a CLIP Model into a Scene Text Spotter (TPAMI)☆192Updated 11 months ago
- finetune stable diffusion with Dreambooth、LoRA、ControlNet☆56Updated 2 years ago
- ☆9Updated 7 months ago
- [AAAI 2024] TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary Multi-Label Classification of CLIP Without Training☆86Updated last year
- [TMM 2023] Self-paced Curriculum Adapting of CLIP for Visual Grounding.☆121Updated 3 months ago
- Building a VLM model starts from the basic module.☆16Updated last year
- A Large-Scale Chinese Image-Text Benchmark for Real-World Short Video Search Scenarios☆12Updated last year
- Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs☆90Updated 4 months ago
- Fine tuning grounding Dino☆100Updated 4 months ago
- InstaGen: Enhancing Object Detection by Training on Synthetic Dataset, CVPR2024☆81Updated last year
- Collection of image and video datasets for generative AI and multimodal visual AI☆28Updated last year
- The official project of paper "Visual Text Meets Low-level Vision: A Comprehensive Survey on Visual Text Processing"☆63Updated 2 weeks ago
- SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation, 2024☆28Updated last month
- [ECCV 2020 Workshop] VIPirios Object Detection Champion☆44Updated last year
- Official implementation of SPTS: Single-Point Text Spotting (ACM MM 2022 Oral)☆142Updated last year
- [IJCV 2024] TransDETR: End-to-end Video Text Spotting with Transformer☆103Updated last year
- [CVPR 2023] Explicit Visual Prompting for Low-Level Structure Segmentations☆204Updated last year
- Code for CVPR 2022 paper "Scene Consistency Representation Learning for Video Scene Segmentation"☆95Updated 2 years ago
- Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed☆90Updated 6 months ago
- 这是一个clip-pytorch的模型,可以训练自己的数据集。☆227Updated 2 years ago
- Towards Local Visual Modeling for Image Captioning☆28Updated 2 years ago
- [ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"☆240Updated 4 months ago
- 这是一个stable-diffusion的库。☆125Updated last year