ZYiJie / Simple-CLIPLinks
基于开源预训练模型来实现一个简单的CLIP模型
☆29Updated 2 years ago
Alternatives and similar repositories for Simple-CLIP
Users that are interested in Simple-CLIP are comparing it to the libraries listed below
Sorting:
- Image Retrieval☆29Updated 3 years ago
- The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoi…☆95Updated 2 years ago
- New generation of CLIP with fine grained discrimination capability, ICML2025☆263Updated 2 weeks ago
- 这是一个clip-pytorch的模型,可以训练自己的数据集。☆236Updated 2 years ago
- 这是一个stable-diffusion的库。☆125Updated 2 years ago
- 多模态 MM +Chat 合集☆274Updated 2 months ago
- [arXiv'25] Official Implementation of "Seg-R1: Segmentation Can Be Surprisingly Simple with Reinforcement Learning"☆32Updated last month
- Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection☆91Updated 5 months ago
- Turning a CLIP Model into a Scene Text Detector (CVPR2023) | Turning a CLIP Model into a Scene Text Spotter (TPAMI)☆193Updated last year
- Research Code for Multimodal-Cognition Team in Ant Group☆162Updated last month
- [ECCV 2024] SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation,☆37Updated 4 months ago
- [CVPR 2023] Explicit Visual Prompting for Low-Level Structure Segmentations☆211Updated last year
- Fine tuning grounding Dino☆127Updated 2 weeks ago
- This is code of paper "ScalableViT: Rethinking the Context-oriented Generalization of Vision Transformer"☆26Updated last year
- Building a VLM model starts from the basic module.☆17Updated last year
- ☆61Updated 8 months ago
- A DETR-style framework for open-vocabulary detection (OVD). CVPR 2023☆199Updated 2 years ago
- ☆115Updated 3 years ago
- Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning☆19Updated 5 months ago
- [ECCV2024] Official implementation of Crowd-SAM: SAM as a Smart Annotator for Object Detection in Crowded Scenes☆89Updated 3 months ago
- OpenMMLab Semantic Segmentation Toolbox and Benchmark.☆55Updated 2 years ago
- finetune stable diffusion with Dreambooth、LoRA、ControlNet☆58Updated 2 years ago
- [AAAI 2024] AnomalyDiffusion: Few-Shot Anomaly Image Generation with Diffusion Model☆249Updated last year
- [TMM 2023] Self-paced Curriculum Adapting of CLIP for Visual Grounding.☆130Updated 6 months ago
- A project that can generate ancient poems based on pictures, including CLIP, T5, GPT2 models☆21Updated 5 months ago
- Defect Spectrum: A Granular Look of Large-Scale Defect Datasets with Rich Semantics (ECCV2024)☆128Updated 11 months ago
- ☆10Updated 9 months ago
- A cli program of image retrieval using dinov2☆75Updated 2 years ago
- This is the repository for paper "UniQA: Unified Vision-Language Pre-training of Quality and Aesthetics"☆24Updated 5 months ago
- This is the third party implementation of the paper Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detectio…☆666Updated 2 weeks ago