AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data
☆35Dec 31, 2023Updated 2 years ago
Alternatives and similar repositories for AutoPrepDemo
Users that are interested in AutoPrepDemo are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆15Mar 12, 2024Updated 2 years ago
- Speech samples and code of BEdit-TTS☆34Oct 8, 2023Updated 2 years ago
- Models and codes for INTERSPEECH 2023 paper DistilXLSR: A Light Weight Cross-Lingual Speech Representation Model☆13Mar 30, 2025Updated 11 months ago
- PyTorch implementation of Retriever: Learning Content-Style Representation☆12Jan 27, 2023Updated 3 years ago
- [APSIPA'22] Exploring Speaker Age Estimation on Different Self-Supervised Learning Models☆14Oct 19, 2022Updated 3 years ago
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆62Nov 1, 2024Updated last year
- ☆12Jul 23, 2024Updated last year
- ICASSP2022 TTS&VC Summary☆14Jun 9, 2022Updated 3 years ago
- Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications☆90Dec 20, 2024Updated last year
- some papers about Kalman Filter☆14Sep 4, 2019Updated 6 years ago
- An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement☆187Sep 1, 2025Updated 6 months ago
- Adaptive Multimodal Reasoning via Reinforcement Learning☆23Jan 11, 2026Updated 2 months ago
- TTS FrontEnd DataSet: Polyphone / Prosody / TextNormalization☆103Feb 5, 2024Updated 2 years ago
- [AAAI 2024] CTX-txt2vec, the acoustic model in UniCATS☆64Nov 18, 2024Updated last year
- Speaker-aware CTC (SACTC) for multi-talker overlapped speech recognition.☆22May 26, 2025Updated 9 months ago
- FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music gener…☆442Jan 25, 2024Updated 2 years ago
- It's a repository for implementations of neural speech editing algorithms.☆204Jan 9, 2024Updated 2 years ago
- A solution to denoising and separating for two-speaker-mixed noisy speech, using a BSRNN inspired network.☆15Aug 22, 2023Updated 2 years ago
- ☆15Sep 6, 2021Updated 4 years ago
- ☆12Jun 14, 2022Updated 3 years ago
- Multi-Task Speech classification of accent and gender of an english speaker on Mozilla's common voice dataset☆27May 30, 2025Updated 9 months ago
- [ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training fo…☆1,081Dec 23, 2024Updated last year
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆114Jan 28, 2026Updated last month
- ☆111Mar 9, 2026Updated 2 weeks ago
- TTS Text Analyzer☆31Jul 20, 2023Updated 2 years ago
- Implementation of StyleTTS for Mandarin☆11Jun 22, 2023Updated 2 years ago
- A Framework for Speech, Language, Audio, Music Processing with Large Language Model☆1,011Jan 15, 2026Updated 2 months ago
- The official repository for the paper “NonVerbalSpeech-38K: A Scalable Pipeline for Enabling Non-Verbal Speech Generation and Understandi…☆63Dec 26, 2025Updated 2 months ago
- ☆56Jul 17, 2023Updated 2 years ago
- Python wrappers for Kaldi Levenshtein's distance and alignment code.☆68Jan 5, 2026Updated 2 months ago
- Automatic speech annotator processing speech with voice activaty detection, overlapping speech detection, speaker diarization and automat…☆33Jun 14, 2024Updated last year
- ☆22Jul 16, 2025Updated 8 months ago
- OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.☆482Nov 23, 2025Updated 4 months ago
- Byte-based multilingual transformer TTS for low-resource/few-shot language adaptation.☆87Jul 25, 2022Updated 3 years ago
- Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis☆1,086Aug 7, 2024Updated last year
- (R&D) Text to speech using phonemes as inputs and audio codec codes as outputs. Loosely based on MegaByte, VALL-E and Encodec.☆48Sep 4, 2023Updated 2 years ago
- An unofficial PyTorch implementation of Mix-Phoneme-Bert☆40Jul 10, 2023Updated 2 years ago
- This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Models☆35Oct 13, 2024Updated last year
- Compute WER and SER for speech recognition evaluation☆27Mar 18, 2026Updated last week