inuwamobarak / OWLv2
Introducing OWLv2: Google's Breakthrough in Zero-Shot Object Detection
☆10Updated last year
Related projects ⓘ
Alternatives and complementary repositories for OWLv2
- ImageSlider custom component for gradio.☆29Updated 5 months ago
- MODNet for clothing matting☆16Updated 3 years ago
- Library for converting from RGB / GrayScale image to base64 and back.☆19Updated 2 years ago
- CLIP中文encoder☆21Updated 2 years ago
- Radam+lookahead implemented by tensorflow☆11Updated 5 years ago
- ☆25Updated 3 years ago
- BUD-E (Buddy) is an open-source voice assistant framework that facilitates seamless interaction with AI models and APIs, enabling the cre…☆11Updated last month
- Karras et al. (2022) diffusion models for PyTorch☆17Updated last year
- Extended Annotations of DeepFashion Images for Fine-grained Recognition☆13Updated 5 years ago
- Towards a rotationally invariant convolutional layer☆10Updated 5 years ago
- ☆29Updated 2 years ago
- PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"☆23Updated last week
- Supervoice Speaker Separation Network☆13Updated 5 months ago
- A multimodal large-scale model, which performs close to the closed-source Qwen-VL-PLUS on many datasets and significantly surpasses the p…☆14Updated 9 months ago
- Building a VLM model starts from the basic module.☆10Updated 7 months ago
- ☆13Updated last year
- Evaluate the performance of computer vision models and prompts for zero-shot models (Grounding DINO, CLIP, BLIP, DINOv2, ImageBind, model…☆34Updated last year
- ☆22Updated 2 years ago
- Codebase for the Recognize Anything Model (RAM)☆63Updated 11 months ago
- Official code repository for paper: "ExPLoRA: Parameter-Efficient Extended Pre-training to Adapt Vision Transformers under Domain Shifts"☆24Updated last month
- ☆16Updated 6 months ago
- I have created a dataset of Image-Text-Pairs by using the cosine similarity of the CLIP embeddings of the image & it's caption derrived f…☆15Updated 3 years ago
- Various test models in WNNX format. It can view with `pip install wnetron && wnetron`☆12Updated 2 years ago
- ☆11Updated 2 months ago
- A pipeline focused on the in-painting of text in images. For example the removal of subtitles in a screenshot of a movie.☆13Updated 2 years ago
- [NCMMSC'2024] Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech☆22Updated 2 months ago
- Demo combining Whisper for speech recognition and Google TTS for speech synthesis to interact with Alpaca-LoRA.☆18Updated 6 months ago
- This script is used to augment image data created using LabelMe-MIT.☆10Updated 2 years ago
- EfficientViT is a new family of vision models for efficient high-resolution vision.☆22Updated last year