shonenkov / CLIP-ODSLinks

CLIP Object Detection, search object on image using natural language #Zeroshot #Unsupervised #CLIP #ODS

☆140

Alternatives and similar repositories for CLIP-ODS

Users that are interested in CLIP-ODS are comparing it to the libraries listed below

Sorting:

allenai / gpv-1
A task-agnostic vision-language architecture as a step towards General Purpose Vision
☆92Updated 4 years ago
salesforce / MUST
PyTorch code for MUST
☆107Updated 5 months ago
facebookresearch / paco
This repo contains documentation and code needed to use PACO dataset: data loaders and training and evaluation scripts for objects, parts…
☆286Updated last year
facebookresearch / diht
Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training
☆138Updated 2 years ago
facebookresearch / SWAG
Official repository for "Revisiting Weakly Supervised Pre-Training of Visual Perception Models". https://arxiv.org/abs/2201.08371.
☆180Updated 3 years ago
mlfoundations / imagenet-captions
Release of ImageNet-Captions
☆51Updated 2 years ago
hila-chefer / RobustViT
[NeurIPS 2022] Official PyTorch implementation of Optimizing Relevance Maps of Vision Transformers Improves Robustness. This code allows …
☆133Updated 2 years ago
vlfom / RNCDL
[NeurIPS 2022] The official implementation of "Learning to Discover and Detect Objects".
☆111Updated 2 years ago
LightDXY / FT-CLIP
CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet
☆223Updated 2 years ago
FocalNet / FocalNet-DINO
This repo contains the code and configuration files for reproducing object detection results of FocalNets with DINO
☆67Updated 2 years ago
TheoCoombes / ClipCap
Using pretrained encoder and language models to generate captions from multimedia inputs.
☆97Updated 2 years ago
mmaaz60 / mvits_for_class_agnostic_od
[ECCV'22] Official repository of paper titled "Class-agnostic Object Detection with Multi-modal Transformer".
☆313Updated 2 years ago
facebookresearch / Generic-Grouping
Open-source code for Generic Grouping Network (GGN, CVPR 2022)
☆111Updated 2 months ago
lucidrains / uniformer-pytorch
Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification tasks, de…
☆102Updated 3 years ago
fkodom / clip-text-decoder
Generate text captions for images from their embeddings.
☆115Updated 2 years ago
Zasder3 / train-CLIP-FT
☆47Updated 4 years ago
valeoai / LOST
Pytorch implementation of LOST unsupervised object discovery method
☆251Updated 2 years ago
ZhangYuanhan-AI / Bamboo
[IJCV] Bamboo: 4 times larger than ImageNet; 2 time larger than Object365; Built by active learning.
☆181Updated last year
naver-ai / vidt
☆313Updated 3 years ago
ShihaoShao-GH / 1st-Place-Solution-in-Google-Universal-Image-Embedding
1st Place Solution in Google Universal Image Embedding
☆67Updated 2 years ago
OpenGVLab / M3I-Pretraining
[CVPR 2023] implementation of Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information.
☆91Updated 2 years ago
kevinzakka / clip_playground
An ever-growing playground of notebooks showcasing CLIP's impressive zero-shot capabilities
☆174Updated 3 years ago
NVlabs / MinVIS
☆275Updated 10 months ago
facebookresearch / active_indexing
Official implementation of "Active Image Indexing"
☆59Updated 2 years ago
BCV-Uniandes / PNG
☆61Updated 4 years ago
iejMac / clip-video-encode
Easily compute clip embeddings from video frames
☆146Updated last year
hanoonaR / object-centric-ovd
[NeurIPS 2022] Official repository of paper titled "Bridging the Gap between Object and Image-level Representations for Open-Vocabulary …
☆296Updated 3 years ago
HendrikStrobelt / miniClip
☆47Updated 5 months ago
iejMac / video2numpy
Optimized library for large-scale extraction of frames and audio from video.
☆204Updated 2 years ago
hirl-team / HIRL
HIRL: A General Framework for Hierarchical Image Representation Learning (http://arxiv.org/abs/2205.13159)
☆40Updated 3 years ago