sieve-community / fast-asdLinks
an optimized, production-ready implementation of active speaker detection
☆67Updated last year
Alternatives and similar repositories for fast-asd
Users that are interested in fast-asd are comparing it to the libraries listed below
Sorting:
- Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.☆46Updated last year
- Efficient approach to speaker diarization using voice characteristics extraction☆97Updated last month
- Speech To Speech: an effort for an open-sourced and modular GPT4-o☆64Updated 9 months ago
- 🐍 🤖 Pip installable package for StyleTTS 2 human-level text-to-speech and voice cloning☆160Updated last year
- Our idea is to combine the power of computer vision model and LLMs. We use YOLO, CLIP and DINOv2 to extract high-level features from imag…☆116Updated 2 years ago
- ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing☆69Updated last year
- ☆158Updated 2 years ago
- Use Florence 2 to auto-label data for use in training fine-tuned object detection models.☆64Updated 11 months ago
- ☆260Updated last year
- Joint speech-language model - respond directly to audio!☆371Updated last year
- ☆14Updated last year
- Real-time Voice Activity Detection (VAD) with some example use case like simple voice bot and live transcription (realtime transcription)☆86Updated last year
- MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation☆385Updated last year
- ☆60Updated last year
- Improving transcription performance of OpenAI Whisper for CPU based deployment☆246Updated 2 years ago
- A project that optimizes Whisper for low latency inference using NVIDIA TensorRT☆86Updated 9 months ago
- A real-time video caption to conversation bot that captures frames generates captions and creates conversational responses using a Large …☆122Updated last year
- TTS with The Massively Multilingual Speech (MMS) project☆233Updated last year
- NVIDIA Riva runnable tutorials☆135Updated last month
- The code for some apps built with Sieve.☆81Updated 7 months ago
- ☆205Updated last year
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆63Updated last month
- Go from raw audio files to a text-audio dataset automatically with OpenAI's Whisper.☆137Updated last year
- Salient feature extractor based on yoloV8☆72Updated 2 years ago
- ☆36Updated 2 years ago
- Cog wrapper for Vchitect/SEINE☆37Updated last year
- ☆300Updated last year
- A high-throughput and memory-efficient inference and serving engine for Whisper, https://mesolitica.com/blog/vllm-whisper☆28Updated 11 months ago
- Transcription with speaker diarization pipeline☆94Updated 2 years ago
- Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translation☆173Updated 2 months ago