ZYiJie / Simple-CLIP
基于开源预训练模型来实现一个简单的CLIP模型
☆21Updated last year
Related projects: ⓘ
- Image Retrieval☆27Updated 2 years ago
- The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoi…☆86Updated last year
- ☆38Updated 2 years ago
- ☆52Updated 8 months ago
- Turning a CLIP Model into a Scene Text Detector (CVPR2023) | Turning a CLIP Model into a Scene Text Spotter (TPAMI)☆172Updated 3 months ago
- 多模态 MM +Chat 合集☆187Updated 2 weeks ago
- This is code of paper "ScalableViT: Rethinking the Context-oriented Generalization of Vision Transformer"☆25Updated last year
- [TMM 2023] Self-paced Curriculum Adapting of CLIP for Visual Grounding.☆104Updated 2 months ago
- [IJCV 2024] TransDETR: End-to-end Video Text Spotting with Transformer☆101Updated 5 months ago
- Official implementation for paper "LightViT: Towards Light-Weight Convolution-Free Vision Transformers"☆136Updated 2 years ago
- OvarNet official implement of the paper "OvarNet: Towards Open-vocabulary Object Attribute Recognition"☆98Updated last year
- [ICCV 2023] Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection☆66Updated 5 months ago
- ☆76Updated last year
- Code for ACM MM2021 paper "Complementary Trilateral Decoder for Fast and Accurate Salient Object Detection"☆33Updated 2 years ago
- [CVPR-2023 Workshop@NFVLR] Official PyTorch implementation of Learning CLIP Guided Visual-Text Fusion Transformer for Video-based Pedestr…☆23Updated 3 months ago
- Fine tuning grounding Dino☆41Updated 3 weeks ago
- InstaGen: Enhancing Object Detection by Training on Synthetic Dataset, CVPR2024☆68Updated 5 months ago
- IJCV2023 Instance Segmentation in the Dark☆73Updated last month
- 这是一个DiT-pytorch的代码,主要用于学习DiT结构。☆59Updated 6 months ago
- [ICLR 2024 poster] Efficient Modulation for Vision Networks☆46Updated 2 months ago
- OpenMMLab Semantic Segmentation Toolbox and Benchmark.☆55Updated 2 years ago
- ☆123Updated 8 months ago
- ☆64Updated 7 months ago
- 1st solution for the Webly-supervised Fine-grained Recognition competition in https://www.cvmart.net/race/10412/base☆33Updated last year
- CounTR: Transformer-based Generalised Visual Counting☆92Updated 2 months ago
- (CVPR2023/TPAMI2024) Integrally Pre-Trained Transformer Pyramid Networks -- A Hierarchical Vision Transformer for Masked Image Modeling☆168Updated last month
- ☆81Updated 3 years ago
- Official PyTorch implementation of "Multi-modal Queried Object Detection in the Wild" (accepted by NeurIPS 2023)☆256Updated 6 months ago
- A DETR-style framework for open-vocabulary detection (OVD). CVPR 2023☆166Updated last year
- Modeling Stroke Mask for End-to-End Text Erasing☆13Updated last year