aizhiqi-work / MM-KWSView external linksLinks
Code for the Interspeech 2024 paper "MM-KWS: Multi-modal Prompts for Multilingual User-defined Keyword Spotting"
☆45Jan 24, 2026Updated 3 weeks ago
Alternatives and similar repositories for MM-KWS
Users that are interested in MM-KWS are comparing it to the libraries listed below
Sorting:
- Official implementation of "PhonMatchNet: Phoneme-Guided Zero-Shot Keyword Spotting for User-Defined Keywords" (INTERSPEECH 2023)☆59Jun 3, 2024Updated last year
- Official code for Metric learning for user-defined keyword spotting☆38Feb 21, 2024Updated last year
- Test-time adaptation for speech recognition model by single utterance. The official implementation of "Listen, Adapt, Better WER: Source-…☆20Apr 1, 2022Updated 3 years ago
- Recipe for LibriPhrase☆33Sep 2, 2023Updated 2 years ago
- Test Framework for few-shot open set KWS☆41Nov 8, 2024Updated last year
- [Tiny KWS] SparkNet: Sparse Binarization for Fast Keyword Spotting☆17Aug 26, 2025Updated 5 months ago
- ☆32Aug 10, 2022Updated 3 years ago
- Collection of PyTorch implementations of Spoken Keyword Spotting presented in research papers.☆36Apr 5, 2024Updated last year
- E2E ASR system☆14Oct 20, 2022Updated 3 years ago
- This repository contains code for applying Data2Vec to pretrain Keyword Transformer model as described in "Improving Label-Deficient Keyw…☆30Mar 6, 2025Updated 11 months ago
- End-to-End Speech Processing Toolkit☆15Jan 20, 2025Updated last year
- ☆11Sep 1, 2024Updated last year
- offical code for Dense-TSNet☆12Sep 17, 2024Updated last year
- ☆24Aug 29, 2025Updated 5 months ago
- [ICASSP 2023] Tempo vs. Pitch: understanding self-supervised tempo estimation☆13Aug 2, 2023Updated 2 years ago
- CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval☆13Jun 27, 2025Updated 7 months ago
- ☆19Aug 25, 2025Updated 5 months ago
- ☆87May 31, 2023Updated 2 years ago
- Few-Shot Keyword Spotting☆70Apr 11, 2021Updated 4 years ago
- Few-shot Keyword Spotting in Any Language and Multilingual Spoken Word Corpus☆184Dec 6, 2024Updated last year
- PyTorch based toolkit for developing spiking neural networks (SNNs) by training and testing them on speech command recognition tasks☆30May 3, 2024Updated last year
- This is an extension of kaldi speech recognition software which allows to perform decoding of speech with hybrid word and phoneme graphs.…☆11Feb 4, 2020Updated 6 years ago
- Once more Diarization: Improving meeting transcription systems through segment-level speaker reassignment☆12Feb 5, 2025Updated last year
- DPDFNet: causal single-channel speech enhancement that boosts DeepFilterNet2 with dual-path RNN blocks for stronger long-range temporal a…☆30Updated this week
- Production First and Production Ready End-to-End Keyword Spotting Toolkit☆691Sep 17, 2025Updated 4 months ago
- Pytorch implementation of BiFSMNv2, TNNLS 2023☆35Feb 10, 2023Updated 3 years ago
- This is a repository for a paper accepted at the 2022 IEEE Spoken Language Technology Workshop (SLT 2022)☆16Dec 1, 2022Updated 3 years ago
- This is the unofficial implementation of MFNet, from paper''a Mask Free Neural Network for Monaural Speech Enhancement''☆13Dec 20, 2024Updated last year
- DiffPhase: Generative Diffusion-based STFT Phase Retrieval☆16Sep 21, 2023Updated 2 years ago
- Voice activity detection and speaker gender segmentation audiovisual corpus☆16Jan 20, 2025Updated last year
- ☆12May 30, 2023Updated 2 years ago
- FNSE-SBGAN: Far-field Speech Enhancement with Schrödinger Bridge and Generative Adversarial Networks☆17May 12, 2025Updated 9 months ago
- Conformer block with Rotary Position Embedding, modified from lucidrains' implement☆16Sep 13, 2024Updated last year
- PyTorch reimplementation of "Keyword Transformer: A Self-Attention Model for Keyword Spotting"☆16Jul 23, 2021Updated 4 years ago
- ☆15Aug 25, 2022Updated 3 years ago
- This repository provides data and code for "Vox Populi, Vox DIY: Benchmark Dataset for Crowdsourced Audio Transcription" paper.☆16Jul 22, 2021Updated 4 years ago
- This is a public repository for RATS Channel-A Speech Data, which is a chargeable noisy speech dataset under LDC. Here we release its Log…☆16Oct 22, 2022Updated 3 years ago
- A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors☆24Jul 30, 2025Updated 6 months ago
- Feature extraction for accented-speech or pathological speech☆17Apr 2, 2019Updated 6 years ago