Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
☆435Nov 27, 2025Updated 3 months ago
Alternatives and similar repositories for Ming-UniAudio
Users that are interested in Ming-UniAudio are comparing it to the libraries listed below
Sorting:
- A unified tokenizer that is capable of both extracting semantic information and enabling high-fidelity audio reconstruction.☆134Sep 19, 2025Updated 5 months ago
- Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice☆507Dec 22, 2025Updated 2 months ago
- MiMo-Audio: Audio Language Models are Few-Shot Learners☆977Updated this week
- Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models☆241Dec 18, 2025Updated 2 months ago
- [ICASSP 2024] TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models☆183Nov 22, 2024Updated last year
- This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaD…☆197Jan 25, 2026Updated last month
- Official code for "EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting"☆110Oct 16, 2025Updated 4 months ago
- Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"☆108Dec 20, 2025Updated 2 months ago
- A single-layer, streaming codec model providing SOTA audio quality and discrete tokens designed for superior downstream modelability.☆113Jun 4, 2025Updated 9 months ago
- [INTERSPEECH 2025 Oral]Official code for "Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment"☆64Jun 16, 2025Updated 8 months ago
- Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications☆87Dec 20, 2024Updated last year
- [NAACL 2025] WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching☆121Mar 27, 2025Updated 11 months ago
- ☆33Nov 18, 2025Updated 3 months ago
- Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3☆434Sep 13, 2024Updated last year
- Text-To-Speech for NotebookLM☆39Jul 20, 2025Updated 7 months ago
- Official repository of the paper "MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization".☆311Aug 4, 2025Updated 7 months ago
- [ICLR 2026] An official implementation of "STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence"☆40Jan 17, 2026Updated last month
- GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning☆938Dec 17, 2025Updated 2 months ago
- 5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs☆57Nov 19, 2025Updated 3 months ago
- ☆80Aug 11, 2025Updated 6 months ago
- AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model☆293Oct 12, 2025Updated 4 months ago
- Real-time Speech-Text Foundation Model Toolkit (wip)☆254Mar 26, 2025Updated 11 months ago
- AcademiCodec: An Open Source Audio Codec Model for Academic Research☆669Dec 27, 2023Updated 2 years ago
- The open source code for SimpleSpeech series☆145Oct 8, 2024Updated last year
- FlashCosyVoice: A lightweight vLLM implementation built from scratch for CosyVoice.☆242Feb 25, 2026Updated last week
- TTS FrontEnd DataSet: Polyphone / Prosody / TextNormalization☆103Feb 5, 2024Updated 2 years ago
- [ICLR 2026] Data Pipeline, Models, and Benchmark for Omni-Captioner.☆119Oct 17, 2025Updated 4 months ago
- A Framework for Speech, Language, Audio, Music Processing with Large Language Model☆995Jan 15, 2026Updated last month
- Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'☆155Mar 24, 2025Updated 11 months ago
- A Survey of Spoken Dialogue Models (60 pages)☆315Nov 28, 2024Updated last year
- An Open-Source Project to Unify Audio Processing and Generation☆260Updated this week
- ☆36Sep 6, 2025Updated 6 months ago
- Awesome speech/audio LLMs, representation learning, and codec models☆1,210Aug 13, 2025Updated 6 months ago
- [INTERSPEECH 2023 Best Paper Shortlist] Official implementation for MT4SSL: Boosting Self-Supervised Speech Representation Learning by In…☆45Mar 25, 2024Updated last year
- Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)☆152Sep 14, 2023Updated 2 years ago
- Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction☆218Feb 28, 2025Updated last year
- Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate☆748Nov 19, 2024Updated last year
- FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis (Accepted by ISCSLP'2024)☆26Feb 22, 2024Updated 2 years ago
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆62Nov 1, 2024Updated last year