sovit-123 / SAM_Molmo_WhisperLinks

An integration of Segment Anything Model, Molmo, and, Whisper to segment objects using voice and natural language.

☆29

Alternatives and similar repositories for SAM_Molmo_Whisper

Users that are interested in SAM_Molmo_Whisper are comparing it to the libraries listed below

Sorting:

BenCaunt / MoondreamObjectTracking
Using the moondream VLM with optical flow for promptable object tracking
☆71Updated 9 months ago
roboflow / vision-ai-checkup
Take your LLM to the optometrist.
☆42Updated last week
VK-Ant / ComputerVision-Exploration-Project
Eye exploration
☆29Updated last week
capjamesg / sam-gpt4v
Use Grounding DINO, Segment Anything, and GPT-4V to label images with segmentation masks for use in training smaller, fine-tuned models.
☆65Updated 2 years ago
adithya-s-k / YoloGemma
Testing and evaluating the capabilities of Vision-Language models (PaliGemma) in performing computer vision tasks such as object detectio…
☆84Updated last year
roboflow / model-leaderboard
Which model is the best at object detection? Which is best for small or large objects? We compare the results in a handy leaderboard.
☆93Updated last week
qubvel / transformers-notebooks
Inference and fine-tuning examples for vision models from 🤗 Transformers
☆162Updated 4 months ago
antar-ai / yolo-examples
This Repository demostrates various examples using YOLO
☆13Updated last year
Ravi-Teja-konda / Surveillance_Video_Summarizer
VLM driven tool that processes surveillance videos, extracts frames, and generates insightful annotations using a fine-tuned Florence-2 V…
☆125Updated 6 months ago
SkalskiP / segment-anything-2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…
☆12Updated last year
SkalskiP / SoM
Unofficial implementation and experiments related to Set-of-Mark (SoM) 👁️
☆88Updated 2 years ago
capjamesg / sam-clip
Use Grounding DINO, Segment Anything, and CLIP to label objects in images.
☆33Updated last year
picselliahq / atlas
Solving Computer Vision with AI agents
☆34Updated 5 months ago
kadirnar / yolov9-pip
This repo is a packaged version of the Yolov9 model.
☆88Updated last month
ariG23498 / gemma3-object-detection
Fine tune Gemma 3 on an object detection task
☆89Updated 4 months ago
CharlesCNorton / yoflo-gui
Real-time object detection using Florence-2 with a user-friendly GUI.
☆30Updated 4 months ago
bhimrazy / chat-with-phi-3-vision
Chat with Phi 3.5/3 Vision LLMs. Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which includ…
☆34Updated 11 months ago
roboflow / cvevals
Evaluate the performance of computer vision models and prompts for zero-shot models (Grounding DINO, CLIP, BLIP, DINOv2, ImageBind, model…
☆37Updated 2 years ago
AyushExel / trolo
An SDK for Transformers + YOLO and other SSD family models
☆65Updated 10 months ago
ariG23498 / fine-tune-paligemma
Notebooks for fine tuning pali gemma
☆117Updated 7 months ago
luxonis / datadreamer
Creation of annotated datasets from scratch using Generative AI and Foundation Computer Vision models
☆130Updated 2 months ago
kturung / colpali-llama-vision-rag
☆114Updated last year
lancedb / yoloexplorer
YOLOExplorer : Iterate on your YOLO / CV datasets using SQL, Vector semantic search, and more within seconds
☆137Updated this week
carlosfab / aircraft-takeoff-speed-estimation
Vehicle speed estimation using YOLOv8
☆31Updated last year
pixeltable / pixeltable-yolox
Lightweight, open-source, high-performance Yolo implementation
☆52Updated 6 months ago
DRakkola / NQvision
☆34Updated last year
gradio-app / sambanova-gradio
☆22Updated last year
HarshTomar1234 / Tennis-Vision
Tennis Detection and Visualization System An advanced computer vision system for tennis match analysis that tracks players and ball move…
☆23Updated 3 months ago
AviSoori1x / seemore
From scratch implementation of a vision language model in pure PyTorch
☆251Updated last year
autodistill / autodistill-gpt-4v
GPT-4V(ision) module for use with Autodistill.
☆25Updated last year