[AutoArk] GPA (General Purpose Audio) can do ASR, TTS and voice conversion with one tiny 300M model!
☆86Jan 29, 2026Updated last month
Alternatives and similar repositories for GPA
Users that are interested in GPA are comparing it to the libraries listed below
Sorting:
- LEMAS‑TTS is a multilingual zero‑shot text‑to‑speech system, supporting 10 languages: Chinese English Spanish Russian French German Ital…☆91Jan 14, 2026Updated last month
- Open-Source Turn-Taking Detection Model and Dataset for Full-Duplex Spoken Dialogue Systems☆80Jan 25, 2026Updated last month
- TASU: A New Style of Alignment of Speech LLM with only Text Training Data, zero-shot on ASR and Other SU tasks☆22Jan 19, 2026Updated last month
- Torch Audio Forced Aligner for Mixed Chinese (Mandarin or Cantonese) and English.☆62Sep 5, 2025Updated 5 months ago
- MTalk-Bench: Evaluating Speech-to-Speech Models in Multi-Turn Dialogues via Arena-style and Rubrics Protocols☆16Nov 19, 2025Updated 3 months ago
- ☆11Nov 7, 2024Updated last year
- semantic tokenizer for speech and music☆21Jul 6, 2025Updated 7 months ago
- Official implementation of WildFX Dataset Generating pipeline.☆15Oct 21, 2025Updated 4 months ago
- This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaD…☆197Jan 25, 2026Updated last month
- A large-scale speech corpus introduced in Spark-TTS, built from diverse open-source datasets for training text-to-speech (TTS) systems.☆105May 5, 2025Updated 9 months ago
- ☆17Jul 23, 2025Updated 7 months ago
- Aligntune : A Modular Toolkit for Post Training Alignment of LLMs☆33Updated this week
- This project is to train an RWKV LLM for TTS generation which compatible to other TTS engine(like fish/cosy/chattts).☆94Oct 8, 2025Updated 4 months ago
- Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'☆154Mar 24, 2025Updated 11 months ago
- Read articles, explore effectiveness metrics for speech enhancement methodologies. Seamlessly integrate code implementations for better u…☆26Apr 19, 2024Updated last year
- Python Wrapper of Silero VAD☆64May 8, 2025Updated 9 months ago
- ☆23Feb 2, 2022Updated 4 years ago
- ☆30Sep 15, 2025Updated 5 months ago
- [EMNLP 2025 Findings] A complete cross-modal RAG system for end-to-end speech-to-speech large models, including ASR-based Retrieval and E…☆27Jul 11, 2025Updated 7 months ago
- UTAUTAI(Unrestricted Tune Automated Technology Artificial Interigence)☆15Oct 27, 2023Updated 2 years ago
- ☆33Aug 6, 2021Updated 4 years ago
- ☆99Jan 19, 2026Updated last month
- Implementation of Acoustic BPE (Shen et al., 2024), extended for RVQ-based Neural Audio Codecs☆77Dec 3, 2025Updated 2 months ago
- This repository contains the training code from paper "SpidR Learning Fast and Stable Linguistic Units for Spoken Language Models Without…☆50Feb 4, 2026Updated 3 weeks ago
- Digital Audio Effects in Python (material for MUSI6202@Georgiatech)☆15Nov 30, 2014Updated 11 years ago
- MFA acoustic model training based on Opencpop☆15Sep 23, 2022Updated 3 years ago
- noise reduction☆17Jul 3, 2024Updated last year
- Trainging, inference, and testing of the SAC speech codec model.☆99Nov 1, 2025Updated 3 months ago
- Muse: Towards Reproducible Long-Form Song Generation with Fine-Grained Style Control☆96Feb 18, 2026Updated last week
- [ACL 2025] OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching☆45Feb 9, 2025Updated last year
- Official PyTorch inference code for the Interspeech 2025 paper: Efficient Speech Enhancement via Embeddings from Pre-trained Generative A…☆75Jun 16, 2025Updated 8 months ago
- [ICML 2025 Tokenization Workshop] HH-Codec: High Compression High-fidelity Discrete Neural Codec for Spoken Language Modeling☆78Sep 28, 2025Updated 5 months ago
- [EMNLP 2025 Findings] Official code for EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion☆34Sep 9, 2025Updated 5 months ago
- MOSS-Speech is a true speech-to-speech large language model without text guidance.☆124Feb 13, 2026Updated 2 weeks ago
- LibriSpeech-Long is a benchmark dataset for long-form speech generation and processing. Released as part of "Long-Form Speech Generation …☆92Dec 28, 2024Updated last year
- ☆49Feb 12, 2026Updated 2 weeks ago
- Self-supervised Generative LM-based Voice Conversion☆54Apr 24, 2025Updated 10 months ago
- An instruct text-to-speech solution based on LLaSA and CosyVoice2 developed by the ASLP lab and collaborators.☆220Jan 20, 2026Updated last month
- ☆97Oct 16, 2025Updated 4 months ago