Official PyTorch implementation of (ICME2025 oral) "AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis"
☆16Feb 1, 2026Updated last month
Alternatives and similar repositories for AutoStyle-TTS
Users that are interested in AutoStyle-TTS are comparing it to the libraries listed below
Sorting:
- Forced alignment decoder for Whisper.☆14Mar 13, 2024Updated last year
- ☆22Nov 25, 2025Updated 3 months ago
- Repo of the paper "Towards Building an End-to-End Multilingual Automatic Lyrics Transcription Model""☆15Jun 28, 2024Updated last year
- Incorporating AutoVocoder to MB-iSTFT-VITS☆48Dec 1, 2022Updated 3 years ago
- ☆15Aug 22, 2025Updated 6 months ago
- Unofficial implementation of ConvNeXt-TTS powered by lightning☆18Oct 20, 2024Updated last year
- Zero-shot voice cloning text-to-speech (TTS) with explicit emotion class conditioning built on F5-TTS☆29Feb 19, 2026Updated last week
- Train no-reference speech quality estimators with multiple datasets via learned, per-dataset alignments.☆18Aug 1, 2025Updated 7 months ago
- Text-to-Speech Latency Benchmark☆22Jan 16, 2026Updated last month
- Code implementation for the paper titled MusicLIME: Explainable Multimodal Music Understanding☆23Jan 27, 2025Updated last year
- Unofficial pytorch implementation of VISinger: Variational Inference with Adversarial Learning for End-to-end Singing Voice Synthesis (IC…☆19May 12, 2023Updated 2 years ago
- Audio-JEPA is an adaptation of the Joint-Embedding Predictive Architecture (JEPA) for self-supervised audio representation learning. Buil…☆40Jun 17, 2025Updated 8 months ago
- This project is to train an RWKV LLM for TTS generation which compatible to other TTS engine(like fish/cosy/chattts).☆94Oct 8, 2025Updated 4 months ago
- KittenTTS is an ultra-lightweight, CPU-friendly text-to-speech model with 15M params for real-time, high-quality voices. Open source, fas…☆23Updated this week
- Where is the "main theme" in an orchestral score?☆12Oct 25, 2025Updated 4 months ago
- CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval☆13Jun 27, 2025Updated 8 months ago
- Data manipulation and transformation for audio signal processing, powered by PyTorch☆10Sep 30, 2024Updated last year
- ☆32Nov 18, 2025Updated 3 months ago
- Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"☆11Apr 10, 2025Updated 10 months ago
- [ICASSP 2025] AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder☆12Mar 11, 2025Updated 11 months ago
- ☆10Sep 2, 2024Updated last year
- T5Voice is a lightweight PyTorch implementation of T5-based text-to-speech synthesis, supporting both streaming and non-streaming speech …☆28Nov 7, 2025Updated 3 months ago
- A neural speech codec based on discrete WavLM representations☆24Aug 28, 2024Updated last year
- An upgrade framework for train and validate compare with icefall using Lightning.☆15Mar 26, 2025Updated 11 months ago
- ☆11May 7, 2022Updated 3 years ago
- Openfst mirror with some fixes☆14Aug 23, 2024Updated last year
- Using OpenVINO to speed up MeloTTS inference☆15Nov 1, 2024Updated last year
- A Benchmark Corpus for Low-Resource Cantonese Punctuation Restoration from Speech Transcripts☆16Dec 3, 2024Updated last year
- The description of FMFCC-A (audio track of FMFCC) dataset and Challenge resluts.☆25Apr 14, 2022Updated 3 years ago
- faster inference☆28Jan 20, 2025Updated last year
- StyleTTS 2 Optimized Training Fork☆33Feb 2, 2025Updated last year
- An automatic prosodic boundary annotation tool for Text-to-Speech Synthesis (TTS).☆51Jun 11, 2024Updated last year
- Tracking beer/wine using Audio Event Detection with Machine Learning☆15Jun 16, 2024Updated last year
- Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications☆87Dec 20, 2024Updated last year
- StyleTTS2 + Vocos as a Decoder☆13Mar 24, 2025Updated 11 months ago
- This is the experimental description of MnTTS2.☆11Apr 11, 2024Updated last year
- DST is a Decoder-only simultaneous machine translation model, which can conduct policy decision and translation concurrently☆11Jun 6, 2024Updated last year
- Zero-Shot Foreign Accent Conversion without a Native Reference☆36May 1, 2024Updated last year
- Sing any popular song with your voice☆11Jul 10, 2022Updated 3 years ago