DaiYvhang / AISHELL-5Links
In-car multi-channel speech transcription system of AISHELL-5.
☆30Updated last month
Alternatives and similar repositories for AISHELL-5
Users that are interested in AISHELL-5 are comparing it to the libraries listed below
Sorting:
- Inference code for Audiodec-Valle-Wenetspeech4TTS☆50Updated 11 months ago
- faster inference☆28Updated 5 months ago
- Apply Score diffusion to improve speech signals recorded under various adverse conditions and distortions, including noise, reverberation…☆63Updated 11 months ago
- Pytorch implementation of "CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR".☆64Updated 2 weeks ago
- (WIP)long form speech generatoins☆31Updated 3 months ago
- The project is associated with the recently-launched INTERSPEECH 2025 Workshop on Multilingual Conversational Speech Language Model (MLC-…☆38Updated last month
- ☆28Updated this week
- Production first, nn-based on-device signal processing toolkit.☆65Updated 2 years ago
- LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement☆77Updated 3 months ago
- Streamable Text-to-Speech model using a language modeling approach, without vector quantization☆92Updated last month
- Official Implementation of TSELM: Target speaker extraction using discrete tokens and language models☆46Updated 2 months ago
- ☆26Updated 2 years ago
- Official repository for Mamba-based Segmentation Model for Speaker Diarization☆37Updated last month
- A large-scale speech corpus introduced in Spark-TTS, built from diverse open-source datasets for training text-to-speech (TTS) systems.☆78Updated 2 months ago
- A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp/pp.☆100Updated this week
- ☆28Updated 5 months ago
- LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement☆42Updated 4 months ago
- A single-layer, streaming codec model providing SOTA audio quality and discrete tokens designed for superior downstream modelability.☆83Updated last month
- AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data☆31Updated last year
- A complete RAG system for end-to-end speech-to-speech large models, including ASR-RAG and E2E-RAG.☆14Updated last week
- [ICASSP2023] Source code, model links and open test sets for paper SeACo-Paraformer.☆31Updated last year
- [ACL 2025] OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching☆37Updated 5 months ago
- A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5☆36Updated 3 months ago
- Incremental Disentanglement for Environment-Aware Zero-Shot Text-to-Speech Synthesis☆27Updated 3 months ago
- A Massive Contextual Speech Recognition Benchmark.☆53Updated this week
- ☆48Updated 10 months ago
- Template for creating audio encoders compatible with X-ARES☆11Updated 5 months ago
- Model configurations for scaling SE models in the paper "Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enha…☆33Updated 11 months ago
- High fidelity, lightweight, end-to-end, streaming, convolution-based neural audio codec☆102Updated 2 weeks ago
- Streaming Vocos☆27Updated last month