capjamesg / sam-gpt4v
Use Grounding DINO, Segment Anything, and GPT-4V to label images with segmentation masks for use in training smaller, fine-tuned models.
☆65Updated last year
Related projects ⓘ
Alternatives and complementary repositories for sam-gpt4v
- Use Florence 2 to auto-label data for use in training fine-tuned object detection models.☆59Updated 3 months ago
- Our idea is to combine the power of computer vision model and LLMs. We use YOLO, CLIP and DINOv2 to extract high-level features from imag…☆100Updated last year
- EdgeSAM model for use with Autodistill.☆25Updated 5 months ago
- ☆60Updated last year
- Evaluate the performance of computer vision models and prompts for zero-shot models (Grounding DINO, CLIP, BLIP, DINOv2, ImageBind, model…☆34Updated last year
- Use Segment Anything 2, grounded with Florence-2, to auto-label data for use in training vision models.☆95Updated 3 months ago
- ☆30Updated 11 months ago
- ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing☆65Updated 6 months ago
- Unofficial implementation and experiments related to Set-of-Mark (SoM) 👁️☆77Updated last year
- GPT-4V(ision) module for use with Autodistill.☆25Updated 3 months ago
- This Repository demostrates various examples using YOLO☆13Updated 9 months ago
- Eye exploration☆22Updated last week
- The open source implementation of "NeVA: NeMo Vision and Language Assistant"☆18Updated last year
- Testing and evaluating the capabilities of Vision-Language models (PaliGemma) in performing computer vision tasks such as object detectio…☆77Updated 5 months ago
- Fine Tuning Multimodal LLM "Idefics 9B" on Pokemon Go Dataset available on Hugging Face.☆18Updated 10 months ago
- Flask-based web application designed to compare text and image embeddings using the CLIP model.☆22Updated 10 months ago
- ☆27Updated 10 months ago
- Implementation of Grounding DINO & Segment Anything, and it allows masking based on prompt, which is useful for programmed inpainting.☆34Updated last year
- Framework agnostic computer vision inference. Run 1000+ models by changing only one line of code. Supports models from transformers, timm…☆121Updated this week
- ☆29Updated last month
- an optimized, production-ready implementation of active speaker detection☆54Updated 5 months ago
- ☆23Updated last month
- A Gradio web UI for Depth-Pro, Sharp Monocular Metric Depth Estimation☆45Updated last month
- Streamlit app presented to the Streamlit LLMs Hackathon September 23☆15Updated 6 months ago
- Use miniGPT-4 batch to generate captions for a lot of images! You should be able to create the best captions you always wanted!☆17Updated last year
- GroundedSAM Base Model plugin for Autodistill☆45Updated 7 months ago
- Fast Real-time Object Detection with High-Res Output https://x.com/_akhaliq/status/1840213012818329826☆52Updated last month
- ☆13Updated 11 months ago
- Python scripts performing optical flow estimation using the NeuFlowV2 model in ONNX.☆32Updated 2 months ago
- ☆100Updated last year