sieve-community / fast-asdLinks
an optimized, production-ready implementation of active speaker detection
☆75Updated last year
Alternatives and similar repositories for fast-asd
Users that are interested in fast-asd are comparing it to the libraries listed below
Sorting:
- Efficient approach to speaker diarization using voice characteristics extraction☆105Updated 6 months ago
- Our idea is to combine the power of computer vision model and LLMs. We use YOLO, CLIP and DINOv2 to extract high-level features from imag…☆118Updated 2 years ago
- 🐍 🤖 Pip installable package for StyleTTS 2 human-level text-to-speech and voice cloning☆161Updated last year
- Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.☆46Updated 2 years ago
- Unofficial implementation and experiments related to Set-of-Mark (SoM) 👁️☆88Updated 2 years ago
- Speech To Speech: an effort for an open-sourced and modular GPT4-o☆73Updated last year
- ☆158Updated 2 years ago
- ☆261Updated last year
- Use Grounding DINO, Segment Anything, and GPT-4V to label images with segmentation masks for use in training smaller, fine-tuned models.☆65Updated 2 years ago
- The code for some apps built with Sieve.☆85Updated last year
- Joint speech-language model - respond directly to audio!☆372Updated last year
- ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing☆69Updated last year
- ☆61Updated 2 years ago
- VideoDB Python SDK☆84Updated this week
- Passively collect images for computer vision datasets on the edge.☆35Updated 2 years ago
- Cog wrapper for Vchitect/SEINE☆37Updated 2 years ago
- VLM driven tool that processes surveillance videos, extracts frames, and generates insightful annotations using a fine-tuned Florence-2 V…☆125Updated 6 months ago
- Use Florence 2 to auto-label data for use in training fine-tuned object detection models.☆68Updated last year
- Example of YOLOv8 object detection on browser. It is powered by ONNX and TFJS and served through JavaScript without any frameworks. It de…☆38Updated last year
- ☆62Updated last year
- LLaVA server (llama.cpp).☆183Updated 2 years ago
- Create topological graph for image segments.☆22Updated last year
- A real-time video caption to conversation bot that captures frames generates captions and creates conversational responses using a Large …☆122Updated 2 years ago
- Accurately locating each head's position in the crowd scenes is a crucial task in the field of crowd analysis. However, traditional densi…☆21Updated last year
- Improving transcription performance of OpenAI Whisper for CPU based deployment☆256Updated 3 years ago
- ☆127Updated 9 months ago
- VoiceRestore: Flow-Matching Transformers for Universal Speech Restoration☆193Updated 8 months ago
- ☆206Updated last year
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆69Updated 2 months ago
- Extract information, summarize, ask questions, and search videos using OpenAI's Vision API 🚀🎦☆62Updated 2 years ago