An attention-based backend allowing efficient fine-tuning of transformer models for speaker verification
☆24Sep 22, 2024Updated last year
Alternatives and similar repositories for SLT22_MultiHead-Factorized-Attentive-Pooling
Users that are interested in SLT22_MultiHead-Factorized-Attentive-Pooling are comparing it to the libraries listed below
Sorting:
- ☆10Dec 22, 2023Updated 2 years ago
- Aty-TTS: Improving fairness for spoken language understanding in atypical speech with Text-to-Speech☆11May 14, 2025Updated 9 months ago
- ☆12Feb 9, 2021Updated 5 years ago
- MSP-Podcast Challenge Baseline Code for Interspeech 2025☆28Dec 4, 2024Updated last year
- Descript Audio Codec - VAE Variant (.dac-vae): High-Fidelity Audio Compression with Variational Autoencoder☆31Aug 30, 2025Updated 6 months ago
- Audio Research in US. US-based professors who work on audio (music, speech, acoustics). For students who would like to apply for RA, PhD,…☆27Feb 27, 2026Updated last week
- [ICASSP'24] Emphasized Non-Target Speaker Knowledge in Knowledge Distillation for Speaker Verification☆16Mar 20, 2024Updated last year
- LIGHTVOC AN UPSAMPLING-FREE GAN VOCODER BASED ON CONFORMER AND INVERSE SHORT-TIME FOURIER TRANSFORM☆18May 17, 2024Updated last year
- Visual Speech Recongnition☆19Dec 24, 2024Updated last year
- Emotion Recognition from Brazilian Portuguese Informal Spontaneous Speech☆21Mar 21, 2022Updated 3 years ago
- Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization (ACM MM 2024)☆22Jul 25, 2024Updated last year
- AdvSV stands as the first dataset developed specifically for evaluating Speaker Verification (SV) systems against adversarial attacks. I…☆11Nov 21, 2023Updated 2 years ago
- PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Superv…☆38Jan 6, 2024Updated 2 years ago
- A unified model for zero-shot singing voice conversion and synthesis☆22Nov 30, 2022Updated 3 years ago
- A Model (maybe an app) that translates the audio of a video from one language to another language, cloning the voice of original video wi…☆15May 19, 2025Updated 9 months ago
- ☆11Jun 14, 2024Updated last year
- KABooks is a tool to automate the process of creating datasets for training Text-To-Speech (TTS) and Speech-To-Text (STT) models. Using a…☆12Mar 24, 2023Updated 2 years ago
- ☆46Feb 16, 2023Updated 3 years ago
- CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval☆13Jun 27, 2025Updated 8 months ago
- 🎵 muse: Music Separation☆11Feb 14, 2024Updated 2 years ago
- Vox-Profile Benchmark☆71Feb 16, 2026Updated 2 weeks ago
- SSL Layerwise analysis for speech deepfake detection☆32Aug 5, 2025Updated 7 months ago
- ☆33Dec 23, 2025Updated 2 months ago
- CML-TTS: A Multilingual Dataset for Speech Synthesis☆33Jul 31, 2024Updated last year
- Once more Diarization: Improving meeting transcription systems through segment-level speaker reassignment☆13Feb 5, 2025Updated last year
- ☆14Jun 16, 2023Updated 2 years ago
- Cross-Speaker Encoding Network for Multi-talker Speech Recognition☆11Mar 14, 2025Updated 11 months ago
- Room impulse response simulation for various array architectures using Monte-Carlo simulation and quaternions (Python)☆17Feb 25, 2026Updated last week
- ☆13Jan 5, 2025Updated last year
- FINALLY: Fast and universal speech enhancement model delivering studio-quality audio for a wide range of recordings.☆25Dec 11, 2025Updated 2 months ago
- SANE-TTS: Stable And Natural End-to-End Multilingual Text-to-Speech☆11Jun 30, 2023Updated 2 years ago
- A repo containing download guidance and corresponding scripts of the VoxBlink dataset.☆28Apr 16, 2024Updated last year
- ☆32Jan 6, 2022Updated 4 years ago
- ☆157Jan 9, 2023Updated 3 years ago
- Simple tool for speech dataset augmentation for modeling various prosodies.☆14Jan 14, 2021Updated 5 years ago
- Models and codes for INTERSPEECH 2023 paper DistilXLSR: A Light Weight Cross-Lingual Speech Representation Model☆13Mar 30, 2025Updated 11 months ago
- Pytorch implementation of the paper : A Global-local Attention Framework for Weakly Labelled Audio Tagging.☆13Feb 6, 2021Updated 5 years ago
- A simple command line tool to calculate WER for ASR.☆14Oct 14, 2024Updated last year
- DysfluentWFST☆18Nov 13, 2025Updated 3 months ago