inuwamobarak / OWLv2
Introducing OWLv2: Google's Breakthrough in Zero-Shot Object Detection
☆17Updated last year
Alternatives and similar repositories for OWLv2:
Users that are interested in OWLv2 are comparing it to the libraries listed below
- Evaluate the performance of computer vision models and prompts for zero-shot models (Grounding DINO, CLIP, BLIP, DINOv2, ImageBind, model…☆35Updated last year
- EfficientSAM + YOLO World base model for use with Autodistill.☆10Updated last year
- SAM-CLIP module for use with Autodistill.☆15Updated last year
- A multimodal large-scale model, which performs close to the closed-source Qwen-VL-PLUS on many datasets and significantly surpasses the p…☆14Updated last year
- EdgeSAM model for use with Autodistill.☆26Updated 10 months ago
- Use Florence 2 to auto-label data for use in training fine-tuned object detection models.☆63Updated 8 months ago
- All You Need to Know About Image Retrieval: a repo to automagically download datasets and run experiments☆55Updated last month
- ☆28Updated 3 years ago
- Vision-oriented multimodal AI☆49Updated 10 months ago
- Converts CLIP models to ONNX☆11Updated 2 years ago
- ☆11Updated 2 years ago
- Official code for infimm-hd☆16Updated 7 months ago
- Facebook Image Similarity Challenge 2021☆19Updated 3 years ago
- Text-Guided Generation of Full-Body Image with Preserved Reference Face for Customized Animation☆23Updated 10 months ago
- Official code repository for the WACV 2022 paper "Visualizing Paired Image Similarity in Transformer Networks"☆22Updated 3 years ago
- Fine-tune of Florence-2 for shot categorization.☆24Updated last month
- EfficientViT is a new family of vision models for efficient high-resolution vision.☆24Updated last year
- PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"☆23Updated this week
- Official code repository for paper: "ExPLoRA: Parameter-Efficient Extended Pre-training to Adapt Vision Transformers under Domain Shifts"☆31Updated 6 months ago
- Code for paper <Explain Me the Painting: Multi-Topic Knowledgeable Art Description Generation> in ICCV 2021.☆13Updated 3 years ago
- Multimodal Open Source Framework for Conversational Agent Research and Development.☆19Updated 2 months ago
- Source code for the paper "GeoWINE: Geolocation based Wiki, Image, News and Events Retrieval"☆11Updated 3 years ago
- Library for converting from RGB / GrayScale image to base64 and back.☆19Updated 2 years ago
- Code for paper Rethinking the Data Annotation Process for Multi-view 3D Pose Estimation with Active Learning and Self-Training☆22Updated 2 years ago
- Description and applications of OpenAI's paper about DALL-E (2021) and implementation of other (CLIP-guided) zero-shot text-to-image gene…☆33Updated 2 years ago
- Pixel Parsing. A reproduction of OCR-free end-to-end document understanding models with open data☆21Updated 8 months ago
- ☆15Updated last year
- Effective frame sampling for ML applications.☆18Updated 4 months ago
- Minimal, clean code for video/image "patchnization" - a process commonly used in tokenizing visual data for use in a Transformer encoder.…☆11Updated 11 months ago
- Official Pytorch Implementation of Self-emerging Token Labeling☆33Updated last year