sovit-123 / SAM_Molmo_WhisperLinks
An integration of Segment Anything Model, Molmo, and, Whisper to segment objects using voice and natural language.
β30Updated 10 months ago
Alternatives and similar repositories for SAM_Molmo_Whisper
Users that are interested in SAM_Molmo_Whisper are comparing it to the libraries listed below
Sorting:
- Eye explorationβ31Updated last month
- Inference and fine-tuning examples for vision models from π€ Transformersβ163Updated 5 months ago
- Use Florence 2 to auto-label data for use in training fine-tuned object detection models.β68Updated last year
- Which model is the best at object detection? Which is best for small or large objects? We compare the results in a handy leaderboard.β95Updated 3 weeks ago
- Take your LLM to the optometrist.β43Updated last month
- EdgeSAM model for use with Autodistill.β29Updated last year
- Using the moondream VLM with optical flow for promptable object trackingβ72Updated 10 months ago
- VLM driven tool that processes surveillance videos, extracts frames, and generates insightful annotations using a fine-tuned Florence-2 Vβ¦β126Updated 7 months ago
- Solving Computer Vision with AI agentsβ35Updated 6 months ago
- This Repository demostrates various examples using YOLOβ13Updated last year
- Use Grounding DINO, Segment Anything, and GPT-4V to label images with segmentation masks for use in training smaller, fine-tuned models.β66Updated 2 years ago
- Testing and evaluating the capabilities of Vision-Language models (PaliGemma) in performing computer vision tasks such as object detectioβ¦β85Updated last year
- Vehicle speed estimation using YOLOv8β32Updated last year
- An SDK for Transformers + YOLO and other SSD family modelsβ64Updated 11 months ago
- Unofficial implementation and experiments related to Set-of-Mark (SoM) ποΈβ88Updated 2 years ago
- Fine tune Gemma 3 on an object detection taskβ95Updated 6 months ago
- Flask-based web application designed to compare text and image embeddings using the CLIP model.β21Updated last year
- Chat with Phi 3.5/3 Vision LLMs. Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which includβ¦β34Updated last year
- This repo is a packaged version of the Yolov9 model.β87Updated last month
- Automatic Thief Detection via CCTV with Alarm System and Perpetrator Image Capture using YOLOv5 + ROI. This project utilizes computer visβ¦β14Updated last year
- β114Updated last year
- Use Grounding DINO, Segment Anything, and CLIP to label objects in images.β34Updated 2 years ago
- YOLOExplorer : Iterate on your YOLO / CV datasets using SQL, Vector semantic search, and more within secondsβ138Updated last week
- The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained modeβ¦β12Updated last year
- β93Updated last month
- Creation of annotated datasets from scratch using Generative AI and Foundation Computer Vision modelsβ132Updated last month
- β34Updated last year
- Ultralytics Notebooks πβ176Updated last month
- Real-time object detection using Florence-2 with a user-friendly GUI.β30Updated 5 months ago
- Notebooks for fine tuning pali gemmaβ117Updated 9 months ago