[INTERSPEECH 2023 Best Paper Shortlist] Official implementation for MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets
☆45Mar 25, 2024Updated last year
Alternatives and similar repositories for MT4SSL
Users that are interested in MT4SSL are comparing it to the libraries listed below
Sorting:
- Project page for "Improving Few-shot Learning for Talking Face System with TTS Data Augmentation" for ICASSP2023☆86Oct 10, 2023Updated 2 years ago
- Paper, Code and Statistics for Self-Supervised Learning and Pre-Training on Speech.☆211Jan 18, 2024Updated 2 years ago
- Speech samples and code of BEdit-TTS☆34Oct 8, 2023Updated 2 years ago
- Implementation of CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning☆48Nov 8, 2023Updated 2 years ago
- Unsupervised phone and word segmentation using dynamic programming on self-supervised VQ features.☆39Mar 4, 2024Updated 2 years ago
- [NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix☆199Feb 25, 2026Updated 3 weeks ago
- INTERSPEECH 2023: "DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models"☆117Jan 26, 2024Updated 2 years ago
- A perceptual weighting filter loss for DNN training in speech enhancement☆24Apr 30, 2022Updated 3 years ago
- [ICLR 2026] Data Pipeline, Models, and Benchmark for Omni-Captioner.☆118Oct 17, 2025Updated 5 months ago
- Official implementation for Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning☆97Nov 20, 2024Updated last year
- (R&D) Text to speech using phonemes as inputs and audio codec codes as outputs. Loosely based on MegaByte, VALL-E and Encodec.☆48Sep 4, 2023Updated 2 years ago
- [AAAI 2024] Code for CTX-vec2wav in UniCATS☆130Jun 11, 2024Updated last year
- ☆12Nov 7, 2024Updated last year
- AD-TUNING: An Adaptive CHILD-TUNING Approach to Efficient Hyperparameter Optimization of Child Networks for Speech Processing Tasks in th…☆11Feb 23, 2024Updated 2 years ago
- LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT☆74Sep 26, 2022Updated 3 years ago
- Towards Comprehensive Evaluation for End-to-End Spoken Dialogue Models☆52Sep 2, 2025Updated 6 months ago
- Code release for "TinySpeech: Attention Condensers for Deep Speech Recognition Neural Networks on Edge Devices"☆21Jun 7, 2025Updated 9 months ago
- ☆11May 9, 2023Updated 2 years ago
- Forced alignment decoder for Whisper.☆15Mar 13, 2024Updated 2 years ago
- ☆37Nov 18, 2025Updated 4 months ago
- Understanding and Tackling Hallucinations in Large Audio-Language Models | ICASSP 2025, Interspeech 2024☆32Mar 14, 2025Updated last year
- [IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformer☆221Nov 30, 2025Updated 3 months ago
- ☆13Sep 25, 2024Updated last year
- TriNet: stabilizing self-supervised learning from complete or slow collapse on ASR.☆26Jun 1, 2023Updated 2 years ago
- A wrapper for Audeering's wav2vec-based dimensional speech emotion recognition☆21Aug 9, 2023Updated 2 years ago
- ☆19Mar 22, 2024Updated 2 years ago
- A JAX library for building lattice-based speech transducer models☆47Mar 2, 2026Updated 2 weeks ago
- Official repository for NAST: Noise Aware Speech Tokenization for Speech Language Models (Interspeech 2024) https://arxiv.org/abs/2406.11…☆46Jul 2, 2024Updated last year
- [NeurIPS 2024] SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words☆56Jun 25, 2024Updated last year
- Text-To-Speech for NotebookLM☆39Jul 20, 2025Updated 8 months ago
- ☆11May 7, 2022Updated 3 years ago
- Voice conversion training with 109 speakers with limited training samples☆35Dec 21, 2020Updated 5 years ago
- Code for "Distribution-based Emotion Recognition in Conversation"☆19Feb 6, 2023Updated 3 years ago
- An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement☆186Sep 1, 2025Updated 6 months ago
- [AAAI 2024] CTX-txt2vec, the acoustic model in UniCATS☆64Nov 18, 2024Updated last year
- Evaluation tool used in the BigVSAN paper☆14Mar 22, 2024Updated 2 years ago
- Official implementation of the paper "Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus" acc…☆77Jul 16, 2023Updated 2 years ago
- ICASSP 2023: 'Speaker recognition with two-step multi-modal deep cleansing'☆44Oct 31, 2022Updated 3 years ago
- Official code for Wav2Seq☆97Jul 19, 2022Updated 3 years ago