A repository for code used to produce the results the ICASSP 2024 paper: "SELF-SUPERVISED PRETRAINING FOR ROBUST PERSONALIZED VOICE ACTIVITY DETECTION IN ADVERSE CONDITIONS"
☆21Nov 25, 2024Updated last year
Alternatives and similar repositories for SSL-PVAD
Users that are interested in SSL-PVAD are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆12Nov 7, 2024Updated last year
- Tr-VAD: An Efficient Transformer based Voice Activity Detection Model☆17Aug 1, 2024Updated last year
- LoRA-based phoneme/prosody control for LLM-based TTS with no G2P - Lightweight adapter for edit and control the target language's phoneme…☆23Aug 14, 2025Updated 7 months ago
- TriNet: stabilizing self-supervised learning from complete or slow collapse on ASR.☆26Jun 1, 2023Updated 2 years ago
- Python implementation of a few speech intelligibility prediction algorithms☆15May 29, 2024Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆13Dec 1, 2021Updated 4 years ago
- An unofficial implementation of the Personal VAD speaker-conditioned voice activity detection method. Bachelor's thesis project.☆80Sep 22, 2022Updated 3 years ago
- Audio samples for the paper 'Phase-aware music super-resolution using generative adversarial networks'☆14May 15, 2020Updated 5 years ago
- Accompanying repository for the paper "Automatic Music Mixing Using a Generative Model of Effect Embeddings"☆28Jan 18, 2026Updated 2 months ago
- Pytorch implementation of "spectro-temporal attention-based voice activity detection"☆13Jun 4, 2024Updated last year
- Code for ICASSP 2024 paper WhisperSeg: Positive Transfer of the Whisper Speech Transformer to Human and Animal Voice Activity Detection☆41Jul 25, 2025Updated 8 months ago
- 3D Sound Source Localization using Masked Autoencoders☆19Feb 12, 2025Updated last year
- Implementation of Sheffield entry for Clarity enhancement challenge.☆18Apr 19, 2022Updated 3 years ago
- ☆23Feb 2, 2022Updated 4 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆10Sep 25, 2024Updated last year
- A speech signal processing library in Python with emphasis on deep learning.☆31Jul 16, 2022Updated 3 years ago
- Convert a mono channel recording into binaural playback with headphones and loudspeakers☆13Dec 6, 2023Updated 2 years ago
- unofficial implementation of "CPTNN: CROSS-PARALLEL TRANSFORMER NEURAL NETWORK FOR TIME-DOMAIN SPEECH ENHANCEMENT"☆15Nov 14, 2023Updated 2 years ago
- A pytorch implementation of the paper "ANSD-MA-MSE: Adaptive Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding"☆61Sep 19, 2024Updated last year
- Text-To-Speech for NotebookLM☆39Jul 20, 2025Updated 8 months ago
- Accompanying repository for the paper "DiffVox: A Differentiable Model for Capturing and Analysing Professional Effects Distributions"☆38Oct 28, 2025Updated 4 months ago
- Script to demonstrate how to use a Language Model for Semantic Turn Detection. Refer to blog post for full details.☆17May 9, 2025Updated 10 months ago
- Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization (ACM MM 2024)☆22Jul 25, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition☆19Jul 16, 2024Updated last year
- Fast algorithm for determined blind source separation with update of demixing filters with joint adjustment of the remaining sources.☆35Mar 22, 2021Updated 5 years ago
- Silero VAD(ncnn): pre-trained enterprise-grade Voice Activity Detector.☆24Aug 21, 2024Updated last year
- This repository contains code for applying Data2Vec to pretrain Keyword Transformer model as described in "Improving Label-Deficient Keyw…☆31Mar 6, 2025Updated last year
- Offline Speaker Diarization with SenseVoice by Sherpa ONNX.☆15Dec 23, 2024Updated last year
- Audio production style transfer with inference-time optimization☆49Nov 18, 2024Updated last year
- ☆15Jul 4, 2024Updated last year
- Official repository for the paper "xLSTM-SENet: xLSTM for Single-Channel Speech Enhancement" (Accepted to INTERSPEECH 2025)☆58Aug 28, 2025Updated 6 months ago
- Reimplementation of Miipher☆29Aug 16, 2023Updated 2 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Script to generate VAD dataset used in Asteroid recipe☆21Sep 30, 2021Updated 4 years ago
- Final training script from HuggingFace Whisper Fine tuning event - to get best results on finetuned model.☆12Dec 24, 2022Updated 3 years ago
- One command to start a streaming ASR server.☆12Oct 2, 2024Updated last year
- Official implementation for our paper "Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations"☆41Aug 14, 2025Updated 7 months ago
- This is the Python library for an unsupervised, fast method for robust voice activity detection (rVAD), as in the paper rVAD: An Unsuperv…☆151Jun 5, 2025Updated 9 months ago
- Matlab and Python libraries for an unsupervised method for robust voice activity detection (rVAD), as in the paper rVAD: An Unsupervised …☆138Jan 20, 2024Updated 2 years ago
- ☆37Feb 23, 2022Updated 4 years ago