MiniMax-AI / audio-toolsLinks
A collection of optimized utilities for text-to-audio processing, enhancing both training and inference workflows. This repository contains robust implementations adapted from open-source libraries.
☆37Updated 4 months ago
Alternatives and similar repositories for audio-tools
Users that are interested in audio-tools are comparing it to the libraries listed below
Sorting:
- This project is to train an RWKV LLM for TTS generation which compatible to other TTS engine(like fish/cosy/chattts).☆81Updated last week
- ☆58Updated last year
- This is a repository that collects common audio noise reduction models, using Gradio to demonstrate the use of each model, which is very …☆40Updated 7 months ago
- [NCMMSC'2024] Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech☆22Updated 11 months ago
- Benchmark for evaluating TTS models on complex prosodic, expressiveness, and linguistic challenges.☆114Updated last week
- a text-conditional diffusion probabilistic model capable of generating high fidelity audio.☆169Updated last year
- This repository contains the code and data for the paper EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control by Haozhe Chen,…☆75Updated 10 months ago
- Official Code for ParrotTTS☆53Updated 9 months ago
- A fast speech-to-speech & speech-to-text translation model that supports simultaneous decoding and offers 28× speedup.☆75Updated 9 months ago
- LSLM implements full duplex modeling in interactive speech language models, based on research by Ma et al. (2024). This project advances …☆74Updated last month
- ☆35Updated 2 weeks ago
- Anim-400K: A dataset designed from the ground up for automated dubbing of video☆109Updated last year
- OpenS2S : Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model☆62Updated 2 weeks ago
- Code for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction (ACL24))☆45Updated 11 months ago
- flow mirror models from JZX AI Labs☆44Updated 10 months ago
- An unofficial PyTorch implementation of VALL-E☆87Updated last week
- Audio tokenization, in the fastest way possible!☆52Updated 11 months ago
- GPT-style network for phonemization with durations of text☆67Updated last year
- ☆40Updated 5 months ago
- Code for ACL 2024 main conference paper "Can We Achieve High-quality Direct Speech-to-Speech Translation Without Parallel Speech Data?".☆24Updated last year
- We Speech Transcript based on LLM, in 300 lines of code.☆174Updated last month
- ☆13Updated last year
- F5-TTS 推理加速,速度提升约4倍!☆102Updated 6 months ago
- The YouTube Text-To-Speech dataset is comprised of waveform audio extracted from YouTube videos alongside their English transcriptions☆51Updated 4 years ago
- X-E-Speech: Joint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion☆96Updated last year
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆73Updated 9 months ago
- SenseVoice-python: A enterprise-grade open source multi-language asr system from funasr opensource with onnxruntime☆97Updated 10 months ago
- Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis☆284Updated 2 weeks ago
- SlamKit is an open source tool kit for efficient training of SpeechLMs. It was used for "Slamming: Training a Speech Language Model on On…☆215Updated 2 months ago
- Awesome TTS☆59Updated 3 years ago