Crowdsourced and Automatic Speech Prominence Estimation
☆25Apr 12, 2024Updated last year
Alternatives and similar repositories for emphases
Users that are interested in emphases are comparing it to the libraries listed below
Sorting:
- Offline Speaker Diarization with SenseVoice by Sherpa ONNX.☆15Dec 23, 2024Updated last year
- Models and codes for INTERSPEECH 2023 paper DistilXLSR: A Light Weight Cross-Lingual Speech Representation Model☆13Mar 30, 2025Updated 11 months ago
- A simple command line tool to calculate WER for ASR.☆14Oct 14, 2024Updated last year
- Official implementation of the paper "Distilling a Pretrained Language Model to a Multilingual ASR Model" (Interspeech 2022)☆12Mar 12, 2024Updated last year
- silero-vad pytorch implement☆35Nov 23, 2024Updated last year
- This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Models☆35Oct 13, 2024Updated last year
- Phoneme alignment representation compatible with multiple forced aligners☆22Apr 12, 2024Updated last year
- WavReward: Spoken Dialogue Models With Generalist Reward Evaluators☆54May 15, 2025Updated 9 months ago
- ☆23Oct 17, 2024Updated last year
- [ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer☆68Nov 1, 2024Updated last year
- Demo for DART, Audio Imagination workshop submission in NeurIPS 2024☆12Apr 15, 2025Updated 10 months ago
- (Interspeech 2023 & ICASSP 2024) Official repository for ARMHuBERT and STaRHuBERT☆40Aug 29, 2024Updated last year
- The official code for the SALMon🍣 benchmark (ICASSP 2025 - Oral)☆49Aug 15, 2025Updated 6 months ago
- A toolkit dedicate for speech evaluation.☆23Sep 26, 2024Updated last year
- CTC decoder with hotwords for ASR.☆34Apr 13, 2025Updated 10 months ago
- Speech-MASSIVE is a multilingual Spoken Language Understanding (SLU) dataset comprising the speech counterpart for a portion of the MASSI…☆24Oct 8, 2025Updated 4 months ago
- Pytorch implementation of "CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR".☆89Feb 2, 2026Updated last month
- A low-bitrate single-codebook 16 / 24 kHz speech codec based on focal modulation☆145Nov 30, 2025Updated 3 months ago
- ☆54Jul 16, 2025Updated 7 months ago
- Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"☆11Apr 10, 2025Updated 10 months ago
- [INTERSPEECH 2024] Official code for VoxSim: A perceptual voice similarity dataset☆12Sep 29, 2025Updated 5 months ago
- CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval☆13Jun 27, 2025Updated 8 months ago
- Data manipulation and transformation for audio signal processing, powered by PyTorch☆10Sep 30, 2024Updated last year
- ☆10Dec 22, 2023Updated 2 years ago
- A library of speech gadgets.☆14Oct 15, 2022Updated 3 years ago
- Onset-and-Offset-Aware Sound Event Detection☆21Feb 10, 2025Updated last year
- Streaming Vocos☆30Jun 10, 2025Updated 8 months ago
- Variable Bitrate Residual Vector Quantization for Audio Coding☆51May 1, 2025Updated 10 months ago
- Official PyTorch implementation of (ICME2025 oral) "AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-…☆16Feb 1, 2026Updated last month
- Cross-Speaker Encoding Network for Multi-talker Speech Recognition☆11Mar 14, 2025Updated 11 months ago
- ☆15Nov 11, 2024Updated last year
- ☆11May 7, 2022Updated 3 years ago
- Official Code for SyllableLM: Learning Coarse Semantic Units for Speech Language Models☆61Jul 1, 2025Updated 8 months ago
- Official implementation of the paper titled "Age and Gender Recognition Using a Convolutional Neural Network with a Specially Designed Mu…☆27Mar 5, 2024Updated last year
- Colab notebooks for Next-gen Kaldi☆30Oct 12, 2025Updated 4 months ago
- An automatic prosodic boundary annotation tool for Text-to-Speech Synthesis (TTS).☆51Jun 11, 2024Updated last year
- [EMNLP 2024] ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers☆125Mar 20, 2025Updated 11 months ago
- (WIP)long form speech generatoins☆31Apr 2, 2025Updated 11 months ago
- Spherical residual vector quantization (SRVQ)☆31Aug 25, 2024Updated last year