AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data
☆35Dec 31, 2023Updated 2 years ago
Alternatives and similar repositories for AutoPrepDemo
Users that are interested in AutoPrepDemo are comparing it to the libraries listed below
Sorting:
- ☆12Jul 23, 2024Updated last year
- [APSIPA'22] Exploring Speaker Age Estimation on Different Self-Supervised Learning Models☆14Oct 19, 2022Updated 3 years ago
- Speech samples and code of BEdit-TTS☆34Oct 8, 2023Updated 2 years ago
- Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications☆87Dec 20, 2024Updated last year
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆62Nov 1, 2024Updated last year
- Speaker-aware CTC (SACTC) for multi-talker overlapped speech recognition.☆21May 26, 2025Updated 9 months ago
- Python wrappers for Kaldi Levenshtein's distance and alignment code.☆68Jan 5, 2026Updated last month
- Multi-Task Speech classification of accent and gender of an english speaker on Mozilla's common voice dataset☆27May 30, 2025Updated 9 months ago
- Adaptive Multimodal Reasoning via Reinforcement Learning☆23Jan 11, 2026Updated last month
- PyTorch implementation of Retriever: Learning Content-Style Representation☆12Jan 27, 2023Updated 3 years ago
- TTS FrontEnd DataSet: Polyphone / Prosody / TextNormalization☆103Feb 5, 2024Updated 2 years ago
- It's a repository for implementations of neural speech editing algorithms.☆203Jan 9, 2024Updated 2 years ago
- Official PyTorch implementation of (ICME2025 oral) "AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-…☆16Feb 1, 2026Updated last month
- TTS Text Analyzer☆31Jul 20, 2023Updated 2 years ago
- Automatic speech annotator processing speech with voice activaty detection, overlapping speech detection, speaker diarization and automat…☆33Jun 14, 2024Updated last year
- ☆111Apr 6, 2022Updated 3 years ago
- Models and codes for INTERSPEECH 2023 paper DistilXLSR: A Light Weight Cross-Lingual Speech Representation Model☆13Mar 30, 2025Updated 11 months ago
- A simple implementation for improving CosyVoice2 by GRPO method☆32Oct 17, 2025Updated 4 months ago
- Implementation of StyleTTS for Mandarin☆11Jun 22, 2023Updated 2 years ago
- using world vocoder to extract features and make data for training neural networks☆11Oct 9, 2017Updated 8 years ago
- Official PyTorch implementation of "EdVAE: Mitigating Codebook Collapse with Evidential Discrete Variational Autoencoders"☆14Sep 20, 2024Updated last year
- This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Models☆35Oct 13, 2024Updated last year
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆114Jan 28, 2026Updated last month
- An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement☆185Sep 1, 2025Updated 6 months ago
- ICASSP2022 TTS&VC Summary☆14Jun 9, 2022Updated 3 years ago
- VocalVerse: A powerful vocal evaluation framework powered by the Qwen LLMs☆38Jan 22, 2026Updated last month
- Source code and speech samples for the DSU-AVO paper accepted to INTERSPEECH 2023☆12May 13, 2024Updated last year
- Evaluation tool used in the BigVSAN paper☆14Mar 22, 2024Updated last year
- Official code implementation for the work Preference Alignment with Flow Matching (NeurIPS 2024)☆66Nov 8, 2024Updated last year
- Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling (Accepted by AAAI'2024)☆59Jun 20, 2024Updated last year
- [AAAI 2024] CTX-txt2vec, the acoustic model in UniCATS☆64Nov 18, 2024Updated last year
- Code for "Distribution-based Emotion Recognition in Conversation"☆19Feb 6, 2023Updated 3 years ago
- [ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer☆68Nov 1, 2024Updated last year
- This repo contains the official PyTorch implementation of "Analyzing Discrete Self Supervised Speech Representation For Spoken Language M…☆20Jan 3, 2023Updated 3 years ago
- Compute WER and SER for speech recognition evaluation☆26Dec 15, 2025Updated 2 months ago
- OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.☆482Nov 23, 2025Updated 3 months ago
- An unofficial PyTorch implementation of Mix-Phoneme-Bert☆40Jul 10, 2023Updated 2 years ago
- Unofficial pytorch reproduction for the paper "Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction" (…☆61Apr 4, 2024Updated last year
- ☆15Sep 6, 2021Updated 4 years ago