JunyiPeng00 / SLT22_MultiHead-Factorized-Attentive-PoolingView external linksLinks
An attention-based backend allowing efficient fine-tuning of transformer models for speaker verification
☆24Sep 22, 2024Updated last year
Alternatives and similar repositories for SLT22_MultiHead-Factorized-Attentive-Pooling
Users that are interested in SLT22_MultiHead-Factorized-Attentive-Pooling are comparing it to the libraries listed below
Sorting:
- ☆10Dec 22, 2023Updated 2 years ago
- ☆12Feb 9, 2021Updated 5 years ago
- Aty-TTS: Improving fairness for spoken language understanding in atypical speech with Text-to-Speech☆11May 14, 2025Updated 9 months ago
- MSP-Podcast Challenge Baseline Code for Interspeech 2025☆28Dec 4, 2024Updated last year
- Descript Audio Codec - VAE Variant (.dac-vae): High-Fidelity Audio Compression with Variational Autoencoder☆31Aug 30, 2025Updated 5 months ago
- Audio Research in US. US-based professors who work on audio (music, speech, acoustics). For students who would like to apply for RA, PhD,…☆27Nov 13, 2025Updated 3 months ago
- LIGHTVOC AN UPSAMPLING-FREE GAN VOCODER BASED ON CONFORMER AND INVERSE SHORT-TIME FOURIER TRANSFORM☆18May 17, 2024Updated last year
- [ICASSP'24] Emphasized Non-Target Speaker Knowledge in Knowledge Distillation for Speaker Verification☆16Mar 20, 2024Updated last year
- Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization (ACM MM 2024)☆22Jul 25, 2024Updated last year
- Emotion Recognition from Brazilian Portuguese Informal Spontaneous Speech☆21Mar 21, 2022Updated 3 years ago
- Visual Speech Recongnition☆19Dec 24, 2024Updated last year
- PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Superv…☆38Jan 6, 2024Updated 2 years ago
- AdvSV stands as the first dataset developed specifically for evaluating Speaker Verification (SV) systems against adversarial attacks. I…☆11Nov 21, 2023Updated 2 years ago
- Vox-Profile Benchmark☆67Sep 12, 2025Updated 5 months ago
- A unified model for zero-shot singing voice conversion and synthesis☆22Nov 30, 2022Updated 3 years ago
- ☆11Jun 14, 2024Updated last year
- 🎵 muse: Music Separation☆11Feb 14, 2024Updated 2 years ago
- CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval☆13Jun 27, 2025Updated 7 months ago
- A Model (maybe an app) that translates the audio of a video from one language to another language, cloning the voice of original video wi…☆15May 19, 2025Updated 8 months ago
- KABooks is a tool to automate the process of creating datasets for training Text-To-Speech (TTS) and Speech-To-Text (STT) models. Using a…☆12Mar 24, 2023Updated 2 years ago
- ☆46Feb 16, 2023Updated 2 years ago
- SSL Layerwise analysis for speech deepfake detection☆32Aug 5, 2025Updated 6 months ago
- CML-TTS: A Multilingual Dataset for Speech Synthesis☆33Jul 31, 2024Updated last year
- ☆32Dec 23, 2025Updated last month
- Room impulse response simulation for various array architectures using Monte-Carlo simulation and quaternions (Python)☆17May 25, 2025Updated 8 months ago
- ☆13Jan 5, 2025Updated last year
- DysfluentWFST☆17Nov 13, 2025Updated 3 months ago
- ☆14Jun 16, 2023Updated 2 years ago
- Cross-Speaker Encoding Network for Multi-talker Speech Recognition☆11Mar 14, 2025Updated 11 months ago
- SANE-TTS: Stable And Natural End-to-End Multilingual Text-to-Speech☆11Jun 30, 2023Updated 2 years ago
- Once more Diarization: Improving meeting transcription systems through segment-level speaker reassignment☆12Feb 5, 2025Updated last year
- DPDFNet: causal single-channel speech enhancement that boosts DeepFilterNet2 with dual-path RNN blocks for stronger long-range temporal a…☆30Updated this week
- FINALLY: Fast and universal speech enhancement model delivering studio-quality audio for a wide range of recordings.☆25Dec 11, 2025Updated 2 months ago
- A repo containing download guidance and corresponding scripts of the VoxBlink dataset.☆28Apr 16, 2024Updated last year
- ☆32Jan 6, 2022Updated 4 years ago
- ☆157Jan 9, 2023Updated 3 years ago
- PyTorch Implementation of [WMCodec: End-to-End Neural Speech Codec with Deep Watermarking for Authenticity Verification](https://arxiv.or…☆16Jul 31, 2025Updated 6 months ago
- Simple tool for speech dataset augmentation for modeling various prosodies.☆14Jan 14, 2021Updated 5 years ago
- A simple command line tool to calculate WER for ASR.☆14Oct 14, 2024Updated last year