ZYiJie / Simple-CLIPLinks
基于开源预训练模型来实现一个简单的CLIP模型
☆30Updated 2 years ago
Alternatives and similar repositories for Simple-CLIP
Users that are interested in Simple-CLIP are comparing it to the libraries listed below
Sorting:
- Building a VLM model starts from the basic module.☆18Updated last year
- New generation of CLIP with fine grained discrimination capability, ICML2025☆297Updated last week
- 这是一个clip-pytorch的模型,可以训练自己的数据集。☆241Updated 2 years ago
- Image Retrieval☆29Updated 3 years ago
- 多模态 MM +Chat 合集☆277Updated last month
- Turning a CLIP Model into a Scene Text Detector (CVPR2023) | Turning a CLIP Model into a Scene Text Spotter (TPAMI)☆198Updated last year
- A project that can generate ancient poems based on pictures, including CLIP, T5, GPT2 models☆22Updated 7 months ago
- The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoi…☆95Updated 2 years ago
- This is code of paper "ScalableViT: Rethinking the Context-oriented Generalization of Vision Transformer"☆26Updated 2 years ago
- Research Code for Multimodal-Cognition Team in Ant Group☆165Updated 2 months ago
- [ECCV 2024] SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation,☆39Updated 6 months ago
- ☆61Updated 4 months ago
- ☆10Updated 11 months ago
- ☆67Updated 5 months ago
- The official code for NeurIPS 2024 paper: Harmonizing Visual Text Comprehension and Generation☆130Updated 10 months ago
- 这是一个stable-diffusion的库。☆125Updated 2 years ago
- [ECCV2024] Official implementation of Crowd-SAM: SAM as a Smart Annotator for Object Detection in Crowded Scenes☆90Updated 4 months ago
- Official Implementation of "Seg-R1: Segmentation Can Be Surprisingly Simple with Reinforcement Learning"☆45Updated 2 months ago
- Fine tuning grounding Dino☆132Updated last month
- 基于ClipCap的看图说话Image Caption模型☆313Updated 3 years ago
- ☆65Updated 10 months ago
- [AAAI 2025 (Oral)] SAIL: Sample-Centric In-Context Learning for Document Information Extraction☆18Updated 8 months ago
- Official implementation of ViTEraser: Harnessing the Power of Vision Transformers for Scene Text Removal with SegMIM Pretraining (AAAI 20…☆57Updated last year
- The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".☆251Updated last year
- This is the repository for paper "UniQA: Unified Vision-Language Pre-training of Quality and Aesthetics"☆24Updated 6 months ago
- ☆30Updated last year
- Use 2 lines to empower absolute time awareness for Qwen2.5VL's MRoPE☆18Updated this week
- Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection☆94Updated 6 months ago
- [AAAI 2024] TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary Multi-Label Classification of CLIP Without Training☆101Updated last year
- [IJCV 2024] TransDETR: End-to-end Video Text Spotting with Transformer☆104Updated last year