sieve-community / fast-asdLinks
an optimized, production-ready implementation of active speaker detection
☆71Updated last year
Alternatives and similar repositories for fast-asd
Users that are interested in fast-asd are comparing it to the libraries listed below
Sorting:
- Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.☆45Updated last year
- Efficient approach to speaker diarization using voice characteristics extraction☆102Updated 3 months ago
- ☆261Updated last year
- Joint speech-language model - respond directly to audio!☆371Updated last year
- Our idea is to combine the power of computer vision model and LLMs. We use YOLO, CLIP and DINOv2 to extract high-level features from imag…☆116Updated 2 years ago
- ☆61Updated 2 years ago
- Use Grounding DINO, Segment Anything, and GPT-4V to label images with segmentation masks for use in training smaller, fine-tuned models.☆65Updated last year
- 🐍 🤖 Pip installable package for StyleTTS 2 human-level text-to-speech and voice cloning☆157Updated last year
- ☆207Updated last year
- Unofficial implementation and experiments related to Set-of-Mark (SoM) 👁️☆87Updated last year
- ☆157Updated 2 years ago
- ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing☆69Updated last year
- VoiceRestore: Flow-Matching Transformers for Universal Speech Restoration☆185Updated 5 months ago
- MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation☆392Updated 2 years ago
- Use Florence 2 to auto-label data for use in training fine-tuned object detection models.☆67Updated last year
- ☆62Updated last year
- Speech To Speech: an effort for an open-sourced and modular GPT4-o☆71Updated last year
- Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translation☆181Updated 2 months ago
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆68Updated last month
- ☆174Updated last year
- ☆127Updated 6 months ago
- Cog wrapper for Vchitect/SEINE☆37Updated last year
- LLaVA server (llama.cpp).☆183Updated last year
- A real-time video caption to conversation bot that captures frames generates captions and creates conversational responses using a Large …☆123Updated last year
- whisper.cpp bindings for python☆106Updated 2 years ago
- The code for some apps built with Sieve.☆82Updated 10 months ago
- Real-time Voice Activity Detection (VAD) with some example use case like simple voice bot and live transcription (realtime transcription)☆99Updated last month
- PlayHT Python SDK - AI Text-to-Speech Streaming & Voice Cloning API☆217Updated 2 months ago
- Video+code lecture on building nanoGPT from scratch☆68Updated last year
- Maybe the new state of the art vision model? we'll see 🤷♂️☆165Updated last year