AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data
☆35Dec 31, 2023Updated 2 years ago
Alternatives and similar repositories for AutoPrepDemo
Users that are interested in AutoPrepDemo are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆16Mar 12, 2024Updated 2 years ago
- Speech samples and code of BEdit-TTS☆34Oct 8, 2023Updated 2 years ago
- Models and codes for INTERSPEECH 2023 paper DistilXLSR: A Light Weight Cross-Lingual Speech Representation Model☆13Mar 30, 2025Updated last year
- PyTorch implementation of Retriever: Learning Content-Style Representation☆12Jan 27, 2023Updated 3 years ago
- [APSIPA'22] Exploring Speaker Age Estimation on Different Self-Supervised Learning Models☆14Oct 19, 2022Updated 3 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆62Nov 1, 2024Updated last year
- ☆12Jul 23, 2024Updated last year
- ICASSP2022 TTS&VC Summary☆14Jun 9, 2022Updated 3 years ago
- Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications☆90Dec 20, 2024Updated last year
- some papers about Kalman Filter☆15Sep 4, 2019Updated 6 years ago
- An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement☆190Updated this week
- Adaptive Multimodal Reasoning via Reinforcement Learning☆23Jan 11, 2026Updated 3 months ago
- TTS FrontEnd DataSet: Polyphone / Prosody / TextNormalization☆103Feb 5, 2024Updated 2 years ago
- [AAAI 2024] CTX-txt2vec, the acoustic model in UniCATS☆64Nov 18, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Speaker-aware CTC (SACTC) for multi-talker overlapped speech recognition.☆22May 26, 2025Updated 11 months ago
- FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music gener…☆443Jan 25, 2024Updated 2 years ago
- It's a repository for implementations of neural speech editing algorithms.☆205Jan 9, 2024Updated 2 years ago
- A solution to denoising and separating for two-speaker-mixed noisy speech, using a BSRNN inspired network.☆15Aug 22, 2023Updated 2 years ago
- ☆15Sep 6, 2021Updated 4 years ago
- ☆12Jun 14, 2022Updated 3 years ago
- Multi-Task Speech classification of accent and gender of an english speaker on Mozilla's common voice dataset☆28May 30, 2025Updated 11 months ago
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆118Jan 28, 2026Updated 3 months ago
- ☆112Mar 9, 2026Updated last month
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- [ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training fo…☆1,113Dec 23, 2024Updated last year
- TTS Text Analyzer☆31Jul 20, 2023Updated 2 years ago
- Implementation of StyleTTS for Mandarin☆11Jun 22, 2023Updated 2 years ago
- A Framework for Speech, Language, Audio, Music Processing with Large Language Model☆1,027Jan 15, 2026Updated 3 months ago
- ☆56Jul 17, 2023Updated 2 years ago
- Python wrappers for Kaldi Levenshtein's distance and alignment code.☆68Apr 9, 2026Updated 3 weeks ago
- The official repository for the paper “NonVerbalSpeech-38K: A Scalable Pipeline for Enabling Non-Verbal Speech Generation and Understandi…☆65Dec 26, 2025Updated 4 months ago
- Automatic speech annotator processing speech with voice activaty detection, overlapping speech detection, speaker diarization and automat…☆33Jun 14, 2024Updated last year
- ☆23Jul 16, 2025Updated 9 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Byte-based multilingual transformer TTS for low-resource/few-shot language adaptation.☆87Jul 25, 2022Updated 3 years ago
- OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.☆491Nov 23, 2025Updated 5 months ago
- Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis☆1,117Aug 7, 2024Updated last year
- (R&D) Text to speech using phonemes as inputs and audio codec codes as outputs. Loosely based on MegaByte, VALL-E and Encodec.☆48Sep 4, 2023Updated 2 years ago
- An unofficial PyTorch implementation of Mix-Phoneme-Bert☆40Jul 10, 2023Updated 2 years ago
- This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Models☆35Oct 13, 2024Updated last year
- Compute WER and SER for speech recognition evaluation☆27Mar 18, 2026Updated last month