sieve-community / fast-asd
an optimized, production-ready implementation of active speaker detection
☆62Updated 11 months ago
Alternatives and similar repositories for fast-asd
Users that are interested in fast-asd are comparing it to the libraries listed below
Sorting:
- Efficient approach to speaker diarization using voice characteristics extraction☆94Updated last year
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆62Updated last month
- Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.☆46Updated last year
- ☆256Updated last year
- VideoDB Python SDK☆70Updated last week
- Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translation☆155Updated this week
- A high-throughput and memory-efficient inference and serving engine for Whisper, https://mesolitica.com/blog/vllm-whisper☆25Updated 9 months ago
- repo for active speaker detection for media videos.☆26Updated last year
- Use Florence 2 to auto-label data for use in training fine-tuned object detection models.☆63Updated 8 months ago
- A quality zero-shot lipsync pipeline built with MuseTalk, LivePortrait, and CodeFormer.☆37Updated 7 months ago
- ☆204Updated 11 months ago
- Incredibly descriptive audiovisual summaries for videos☆40Updated 9 months ago
- ☆132Updated last week
- ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing☆68Updated 11 months ago
- ☆62Updated 9 months ago
- A project that optimizes Whisper for low latency inference using NVIDIA TensorRT☆81Updated 6 months ago
- create dataset from list of youtube links easily☆17Updated 2 years ago
- ☆287Updated 10 months ago
- A WebRTC server that allows you to interact with an LLM using your speech and responds back with generated audio.☆130Updated 10 months ago
- A streaming whisper server for on-prem transcription☆20Updated 8 months ago
- ☆36Updated last year
- Speech recognition & diarisation solution with text alignment, deployed in AML pipelines☆94Updated last year
- Offical code for the CVPR 2024 Paper: Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language☆77Updated 11 months ago
- Notebooks using the Neural Magic libraries 📓☆41Updated 9 months ago
- Misc. tools/scripts that I made to use for tortoise☆21Updated 8 months ago
- Official code for the paper "GestSync: Determining who is speaking without a talking head" published at BMVC 2023☆46Updated 8 months ago
- LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM☆244Updated last month
- The code for some apps built with Sieve.☆79Updated 5 months ago
- VoiceRestore: Flow-Matching Transformers for Universal Speech Restoration☆163Updated 3 weeks ago
- PyTorch code implementation of EfficientSpeech - to be presented at ICASSP2023.☆169Updated last year