sieve-community / fast-asdLinks
an optimized, production-ready implementation of active speaker detection
☆70Updated last year
Alternatives and similar repositories for fast-asd
Users that are interested in fast-asd are comparing it to the libraries listed below
Sorting:
- Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.☆46Updated last year
- Efficient approach to speaker diarization using voice characteristics extraction☆100Updated 3 months ago
- Use Florence 2 to auto-label data for use in training fine-tuned object detection models.☆67Updated last year
- ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing☆70Updated last year
- 🐍 🤖 Pip installable package for StyleTTS 2 human-level text-to-speech and voice cloning☆157Updated last year
- EdgeSAM model for use with Autodistill.☆29Updated last year
- Our idea is to combine the power of computer vision model and LLMs. We use YOLO, CLIP and DINOv2 to extract high-level features from imag…☆117Updated 2 years ago
- whisper.cpp bindings for python☆102Updated 2 years ago
- Maybe the new state of the art vision model? we'll see 🤷♂️☆166Updated last year
- ☆262Updated last year
- Use Grounding DINO, Segment Anything, and GPT-4V to label images with segmentation masks for use in training smaller, fine-tuned models.☆66Updated last year
- Official Code for Tracking Any Object Amodally☆118Updated last year
- Joint speech-language model - respond directly to audio!☆372Updated last year
- ☆158Updated 2 years ago
- Unofficial implementation and experiments related to Set-of-Mark (SoM) 👁️☆88Updated last year
- ☆206Updated last year
- VoiceRestore: Flow-Matching Transformers for Universal Speech Restoration☆185Updated 5 months ago
- Accurately locating each head's position in the crowd scenes is a crucial task in the field of crowd analysis. However, traditional densi…☆21Updated last year
- Passively collect images for computer vision datasets on the edge.☆35Updated last year
- ☆60Updated last year
- LLaVA server (llama.cpp).☆182Updated last year
- ☆127Updated 6 months ago
- Improving transcription performance of OpenAI Whisper for CPU based deployment☆249Updated 2 years ago
- Use Segment Anything 2, grounded with Florence-2, to auto-label data for use in training vision models.☆128Updated last year
- Transcription with speaker diarization pipeline☆94Updated 2 years ago
- A project that optimizes Whisper for low latency inference using NVIDIA TensorRT☆90Updated 11 months ago
- A real-time video caption to conversation bot that captures frames generates captions and creates conversational responses using a Large …☆123Updated last year
- PlayHT Python SDK - AI Text-to-Speech Streaming & Voice Cloning API☆216Updated last month
- Cog wrapper for Vchitect/SEINE☆37Updated last year
- repo for active speaker detection for media videos.☆29Updated last year