sovit-123 / SAM_Molmo_Whisper
An integration of Segment Anything Model, Molmo, and, Whisper to segment objects using voice and natural language.
☆25Updated 2 months ago
Alternatives and similar repositories for SAM_Molmo_Whisper
Users that are interested in SAM_Molmo_Whisper are comparing it to the libraries listed below
Sorting:
- Use Florence 2 to auto-label data for use in training fine-tuned object detection models.☆63Updated 9 months ago
- EdgeSAM model for use with Autodistill.☆26Updated 11 months ago
- Testing and evaluating the capabilities of Vision-Language models (PaliGemma) in performing computer vision tasks such as object detectio…☆80Updated 11 months ago
- Inference and fine-tuning examples for vision models from 🤗 Transformers☆139Updated last week
- Use Segment Anything 2, grounded with Florence-2, to auto-label data for use in training vision models.☆121Updated 9 months ago
- Solving Computer Vision with AI agents☆31Updated last week
- Which model is the best at object detection? Which is best for small or large objects? We compare the results in a handy leaderboard.☆70Updated this week
- Unofficial implementation and experiments related to Set-of-Mark (SoM) 👁️☆86Updated last year
- ☆57Updated 5 months ago
- AnyModal is a Flexible Multimodal Language Model Framework for PyTorch☆93Updated 4 months ago
- Using the moondream VLM with optical flow for promptable object tracking☆54Updated 2 months ago
- Use Grounding DINO, Segment Anything, and CLIP to label objects in images.☆31Updated last year
- ☆21Updated 6 months ago
- ☆69Updated last month
- Lightweight models for real-time semantic segmentationon PyTorch (include SQNet, LinkNet, SegNet, UNet, ENet, ERFNet, EDANet, ESPNet, ESP…☆11Updated last year
- Evaluate the performance of computer vision models and prompts for zero-shot models (Grounding DINO, CLIP, BLIP, DINOv2, ImageBind, model…☆35Updated last year
- VLM driven tool that processes surveillance videos, extracts frames, and generates insightful annotations using a fine-tuned Florence-2 V…☆110Updated 7 months ago
- Code examples showing how to use Gemini, Gemma, Imagen, and more.☆39Updated last month
- Use Grounding DINO, Segment Anything, and GPT-4V to label images with segmentation masks for use in training smaller, fine-tuned models.☆66Updated last year
- Agentic RAG to help you build a startup🚀☆41Updated last month
- Eye exploration☆27Updated 3 months ago
- This repository contains a fork from "language-models-trajectory-generators", the goal is to test the same functionality with Mistrals LL…☆21Updated 7 months ago
- Testbed for multimodal retrieval augmented generation techniques with FiftyOne, LlamaIndex, and Milvus☆18Updated 9 months ago
- YOLOv10: Real-Time End-to-End Object Detection☆10Updated 11 months ago
- ☆16Updated last year
- Multimodal AI agent with Llama 3.2: A Streamlit app that processes text, images, PDFs, and PPTs, integrating NIM microservices, Milvus, a…☆116Updated 7 months ago
- This Repository demostrates various examples using YOLO☆13Updated last year
- Notebooks for fine tuning pali gemma☆102Updated last month
- Notebooks using the Neural Magic libraries 📓☆41Updated 9 months ago
- Advanced Coding AI Assistant that uses a Gradio interface to stream coding related responses. ChatRAG supports local and API inference an…☆21Updated last week