ZYiJie / Simple-CLIPLinks
基于开源预训练模型来实现一个简单的CLIP模型
☆28Updated 2 years ago
Alternatives and similar repositories for Simple-CLIP
Users that are interested in Simple-CLIP are comparing it to the libraries listed below
Sorting:
- Image Retrieval☆29Updated 3 years ago
- [TMM 2023] Self-paced Curriculum Adapting of CLIP for Visual Grounding.☆126Updated 5 months ago
- The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoi…☆94Updated 2 years ago
- New generation of CLIP with fine grained discrimination capability, ICML2025☆203Updated last month
- finetune stable diffusion with Dreambooth、LoRA、ControlNet☆57Updated 2 years ago
- [ECCV 2024] SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation,☆30Updated 3 months ago
- [AAAI 2024] TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary Multi-Label Classification of CLIP Without Training☆92Updated last year
- Research Code for Multimodal-Cognition Team in Ant Group☆153Updated last month
- A DETR-style framework for open-vocabulary detection (OVD). CVPR 2023☆195Updated 2 years ago
- OvarNet official implement of the paper "OvarNet: Towards Open-vocabulary Object Attribute Recognition"☆104Updated 2 years ago
- Fine tuning grounding Dino☆113Updated 6 months ago
- [CVPR-2023 Workshop@NFVLR] Official PyTorch implementation of Learning CLIP Guided Visual-Text Fusion Transformer for Video-based Pedestr…☆28Updated 3 months ago
- InstaGen: Enhancing Object Detection by Training on Synthetic Dataset, CVPR2024☆81Updated last year
- Text-To-Image Generation with Chinese Characters☆21Updated last year
- Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs☆91Updated 5 months ago
- Implementation of PyramidCLIP(NeurIPS2022).☆31Updated 2 years ago
- ☆60Updated 7 months ago
- [ICLR2024 Spotlight] Code Release of CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction☆189Updated last year
- Collection of image and video datasets for generative AI and multimodal visual AI☆29Updated last year
- Code for CVPR 2022 paper "Scene Consistency Representation Learning for Video Scene Segmentation"☆97Updated 2 years ago
- A curated list of publications on image and video segmentation leveraging Multimodal Large Language Models (MLLMs), highlighting state-of…☆97Updated this week
- Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection☆88Updated 3 months ago
- ☆13Updated 10 months ago
- 这是一个clip-pytorch的模型,可以训练自己的数据集。☆232Updated 2 years ago
- [AAAI 2023] DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding☆56Updated 2 years ago
- [IJCV 2024] TransDETR: End-to-end Video Text Spotting with Transformer☆103Updated last year
- Building a VLM model starts from the basic module.☆16Updated last year
- [NeurIPS2023] DatasetDM:Synthesizing Data with Perception Annotations Using Diffusion Models☆321Updated last year
- Search photos on Unsplash based on OpenAI's CLIP model, support search with joint image+text queries and attention visualization.☆222Updated 3 years ago
- ☆9Updated 8 months ago