MiniMax-AI/audio-tools

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/MiniMax-AI/audio-tools)

MiniMax-AI / audio-tools

A collection of optimized utilities for text-to-audio processing, enhancing both training and inference workflows. This repository contains robust implementations adapted from open-source libraries.

☆49

Alternatives and similar repositories for audio-tools

Users that are interested in audio-tools are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ShawnPi233 / SynParaSpeech
View on GitHub
Official Repository of Paper: "SynParaSpeech: Automated Synthesis of Paralinguistic Datasets for Speech Generation and Understanding" (IC…
☆72Apr 27, 2026Updated 2 months ago
5Hyeons / StyleTTS2-Vocos
View on GitHub
StyleTTS2 + Vocos as a Decoder
☆13Mar 24, 2025Updated last year
THUsatlab / BERT-LID
View on GitHub
Leveraging BERT to Improve Spoken Language Identification
☆17Nov 22, 2022Updated 3 years ago
HeCheng0625 / Diffusion-Speech-Tokenizer
View on GitHub
This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaD…
☆198Jan 25, 2026Updated 5 months ago
Lexsi-Labs / aligntune
View on GitHub
Aligntune : A Modular Toolkit for Post Training Alignment of LLMs
☆37Jun 26, 2026Updated last week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
beiciliang / estimate-f0-inharmonicity
View on GitHub
Estimate the fundamental frequency and inharmonicity coefficient of an isolated piano note
☆11Jan 1, 2018Updated 8 years ago
anshuman23 / InfDataSel
View on GitHub
Code for paper: “What Data Benefits My Classifier?” Enhancing Model Performance and Interpretability through Influence-Based Data Selecti…
☆23May 17, 2024Updated 2 years ago
alibaba / vstyle
View on GitHub
☆33Sep 15, 2025Updated 9 months ago
kamperh / linearvc
View on GitHub
Voice conversion with just linear regression.
☆37Sep 25, 2025Updated 9 months ago
tensorchord / ai-infra-statistics
View on GitHub
This repository contains statistics about the AI Infrastructure products.
☆16Feb 27, 2025Updated last year
pengzhendong / torchfa
View on GitHub
Torch Audio Forced Aligner for Mixed Chinese (Mandarin or Cantonese) and English.
☆61Sep 5, 2025Updated 10 months ago
vivian556123 / NeurIPS2024-CoVoMix
View on GitHub
Official repo for CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
☆67Jan 16, 2025Updated last year
NKU-HLT / DIFFA
View on GitHub
[AAAI 2026 & ACL 2026] The official implementation of the DIFFA series for dLLM-based large audio language model
☆82Apr 7, 2026Updated 3 months ago
Ruiqi-Yan / URO-Bench
View on GitHub
Towards Comprehensive Evaluation for End-to-End Spoken Dialogue Models
☆55Sep 2, 2025Updated 10 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ZFancy / DivOE
View on GitHub
[NeurIPS 2023] "Diversified Outlier Exposure for Out-of-Distribution Detection via Informative Extrapolation"
☆11Oct 6, 2023Updated 2 years ago
R1ckShi / FrontEnd-AEC
View on GitHub
Acoustic echo cancelation(AEC) is a main algorithm in the pipe line of acoustic devices with KWS or ASR. FNLMS is used.
☆19Apr 22, 2019Updated 7 years ago
Respaired / RiFornet_Vocoder
View on GitHub
a Neural Vocoder supporting Ring Attention, Conformer and NSF.
☆25Aug 1, 2025Updated 11 months ago
ElvishElvis / LCA-on-the-line
View on GitHub
LCA-on-the-line (ICML 2024 Oral)
☆14Feb 13, 2025Updated last year
xiaohua-chen / AREA
View on GitHub
☆14Dec 28, 2023Updated 2 years ago
davanstrien / huggingface-tldr
View on GitHub
Experimental tl;dr summaries for datasets on the Hugging Face Hub!
☆10Apr 4, 2024Updated 2 years ago
chenpipi0807 / LTX-Video-Trainer-GUI
View on GitHub
LTX-Video-Trainer-GUI 是为LTX视频lora模型训练提供的GUI工具，支持通过简单的界面训练 LoRA 模型用于视频生成。本训练器提供了直观的 GUI 界面，使用户能够轻松设置和启动训练流程，无需编写复杂代码。
☆13Jul 18, 2025Updated 11 months ago
fansunqi / AKeyS
View on GitHub
Agentic Keyframe Search for Video Question Answering
☆18Jun 30, 2026Updated last week
SwanHubX / glm4-finetune
View on GitHub
ChatGLM4微调简介
☆27Apr 8, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ZHANG1023 / FED-NeRF
View on GitHub
☆22Jan 10, 2024Updated 2 years ago
mbortolon97 / IFFNeRF
View on GitHub
Code for the paper "IFFNeRF: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model"
☆12May 26, 2024Updated 2 years ago
DandanGuo1993 / reweight-imbalance-classification-with-OT
View on GitHub
☆13Nov 8, 2022Updated 3 years ago
jishengpeng / ControlSpeech
View on GitHub
[ACL 2025 Main] ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec
☆276Nov 22, 2024Updated last year
xiaomi-research / dasheng-glap
View on GitHub
Official Implementation of GLAP - General Language Audio Pretraining
☆72May 14, 2026Updated last month
unilight / jatts
View on GitHub
JATTS: A modern, research-oriented Japanese Text-to-speech Open-sourced Toolkit
☆43Mar 13, 2026Updated 3 months ago
zy445566 / tfjs-tutorials-zh
View on GitHub
tfjs(tensorflow.js)中文指南,同时增加一些自己的代码和理解
☆20Nov 23, 2018Updated 7 years ago
zerohd4869 / SPC
View on GitHub
The official repository for AAAI 2024 Oral paper "Structured Probabilistic Coding".
☆13Sep 7, 2024Updated last year
tmlr-group / DAL
View on GitHub
[NeurIPS 2023] "Learning to Augment Distributions for Out-of-distribution Detection"
☆11Nov 14, 2023Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
KVDmitrieva / source_sep_hifi
View on GitHub
☆20Jun 29, 2025Updated last year
DoodleBears / split-lang
View on GitHub
✨ Split text by languages (e.g. 你喜欢看アニメ吗 -> 你喜欢看 | アニメ | 吗) for NLP tasks (e.g. parse, TTS). Powered by fasttext and budoux
☆74Sep 18, 2025Updated 9 months ago
weimingtom / wmt_ai_study
View on GitHub
My AI study
☆27Jun 6, 2026Updated last month
annahdo / counterfactuals
View on GitHub
☆14Dec 4, 2023Updated 2 years ago
MoonshotAI / Kimi-Audio-Evalkit
View on GitHub
☆168Nov 20, 2025Updated 7 months ago
yl4579 / HiFTNet
View on GitHub
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform
☆254Jan 14, 2025Updated last year
yitongdeng-projects / infinite_resolution_integral_noise_warping_code
View on GitHub
Official Implementation of Infinite-Resolution Integral Noise Warping for Diffusion Models [ICLR 2025]
☆16Mar 15, 2025Updated last year