TriNet: stabilizing self-supervised learning from complete or slow collapse on ASR.
☆26Jun 1, 2023Updated 2 years ago
Alternatives and similar repositories for TriNet
Users that are interested in TriNet are comparing it to the libraries listed below
Sorting:
- ☆15Apr 2, 2025Updated 10 months ago
- ☆37Jun 30, 2022Updated 3 years ago
- Voice conversion training with 109 speakers with limited training samples☆35Dec 21, 2020Updated 5 years ago
- Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dial…☆40Jan 27, 2025Updated last year
- ☆33Nov 27, 2021Updated 4 years ago
- ☆10Sep 2, 2024Updated last year
- Wenet speech to text for react native☆10Nov 1, 2022Updated 3 years ago
- [INTERSPEECH 2023 Best Paper Shortlist] Official implementation for MT4SSL: Boosting Self-Supervised Speech Representation Learning by In…☆45Mar 25, 2024Updated last year
- This is not remotely close to a finished product, and does not intend to nor does this claim to be working fine-tuning code for MaskGCT. …☆13Dec 4, 2024Updated last year
- Variable Bitrate Residual Vector Quantization for Audio Coding☆51May 1, 2025Updated 10 months ago
- silero-vad pytorch implement☆35Nov 23, 2024Updated last year
- Source code for "BLOOM-Net: Blockwise Optimization for Masking Networks Toward Scalable and Efficient Speech Enhancement"☆14Feb 13, 2022Updated 4 years ago
- ☆11Mar 22, 2023Updated 2 years ago
- UIE(Universal Information Extraction) infer by ncnn☆15Sep 22, 2024Updated last year
- ☆26Apr 21, 2021Updated 4 years ago
- Lightweight speaker anonymization [IEEE SLT2021]☆27Jun 6, 2022Updated 3 years ago
- faster inference☆28Jan 20, 2025Updated last year
- ☆13Mar 11, 2025Updated 11 months ago
- S3PRL for Speech Emotion Recognition (see s3prl > downstream)☆15Updated this week
- Code for the paper "JELLY: Joint Emotion Recognition and Context Reasoning with LLMs for Conversational Speech Synthesis"☆14Nov 5, 2024Updated last year
- Spherical residual vector quantization (SRVQ)☆31Aug 25, 2024Updated last year
- Official repository for NAST: Noise Aware Speech Tokenization for Speech Language Models (Interspeech 2024) https://arxiv.org/abs/2406.11…☆46Jul 2, 2024Updated last year
- [ICLR 2026] Data Pipeline, Models, and Benchmark for Omni-Captioner.☆119Oct 17, 2025Updated 4 months ago
- AudioStretchy is a Python wrapper around the `audio-stretch` C library, which performs fast, high-quality time-stretching of WAV/MP3 file…☆61Sep 24, 2025Updated 5 months ago
- ☆67Aug 16, 2023Updated 2 years ago
- ☆36Sep 6, 2025Updated 5 months ago
- A robust pitch tracker using synchro-squeezed fft and frequency domain autocorrelation☆36Jan 17, 2024Updated 2 years ago
- Implementation of Google's USM speech model in Pytorch☆35Feb 7, 2026Updated 3 weeks ago
- ☆17Apr 28, 2021Updated 4 years ago
- 端到端语音识别实现;包含LAS、CTC、RNNT解码方式,模型SA(MHA)、LSTM、CNN、DFSMN等☆15Jun 4, 2021Updated 4 years ago
- ☆16Apr 4, 2022Updated 3 years ago
- This is the implementation of the manuscript "Learning General All-Neural Speech Enhancement based on Taylor's Approximation Theory", whi…☆14Nov 25, 2022Updated 3 years ago
- Forced alignment decoder for Whisper.☆14Mar 13, 2024Updated last year
- ☆35Feb 10, 2026Updated 2 weeks ago
- ☆14Jun 12, 2015Updated 10 years ago
- ☆16Sep 12, 2019Updated 6 years ago
- Code associated with the paper: CTC-DRO: Robust Optimization for Reducing Language Disparities in Speech Recognition.☆15May 16, 2025Updated 9 months ago
- LoRA-based phoneme/prosody control for LLM-based TTS with no G2P - Lightweight adapter for edit and control the target language's phoneme…☆23Aug 14, 2025Updated 6 months ago
- ☆61Nov 4, 2023Updated 2 years ago