alibaba/unified-audio

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/alibaba/unified-audio)

alibaba / unified-audio

An Open-Source Project to Unify Audio Processing and Generation

☆482

Alternatives and similar repositories for unified-audio

Users that are interested in unified-audio are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Magic-meet / retrieve-pro-plus
View on GitHub
☆31May 25, 2026Updated last month
Githubhgh / UMF_CVPR
View on GitHub
Code Implementation for "Unified Number-Free Text-to-Motion Generation Via Flow Matching" (CVPR26)
☆35Jun 6, 2026Updated last month
bo-miao / LangMap
View on GitHub
LangMap: A Human-Verified Benchmark for Hierarchical Open-Vocabulary Goal Navigation
☆49Jun 3, 2026Updated last month
LocoreMind / locoagent
View on GitHub
AI-powered social media agent with real browser automation
☆1,032Jun 27, 2026Updated 3 weeks ago
Kevin-naticl / LLaSE-G1
View on GitHub
LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement
☆105Apr 1, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
hyyan2k / PGUSE
View on GitHub
This is the official implementation of PGUSE
☆41Jun 7, 2025Updated last year
KdaiP / DC-Speech-VAE
View on GitHub
5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs
☆57Nov 19, 2025Updated 8 months ago
inclusionAI / Ming-UniAudio
View on GitHub
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
☆450Nov 27, 2025Updated 7 months ago
ASLP-lab / SenSE
View on GitHub
Official code of SenSE.
☆90Oct 30, 2025Updated 8 months ago
Dahan-Wang / Rethinking-Flow-and-Diffusion-Bridge-Models-for-Speech-Enhancement
View on GitHub
☆39Feb 23, 2026Updated 5 months ago
yanghaha0908 / WavCube
View on GitHub
Official code for "WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling"
☆62Jun 27, 2026Updated 3 weeks ago
cisco-open / pase
View on GitHub
PASE: Phonologically Anchored Speech Enhancer
☆86Jul 15, 2026Updated last week
Ruiqi-Yan / Awesome-Audio-Editing
View on GitHub
A curated list of models, benchmarks, tools and guides for audio editing
☆34Jul 7, 2026Updated 2 weeks ago
Soul-AILab / SAC
View on GitHub
[ACL 2026 Main] Training, inference, and testing of the SAC speech codec model.
☆108Nov 1, 2025Updated 8 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
xiaomi-research / dasheng-audiogen
View on GitHub
end-to-end text to audio scene generation model
☆50Jun 16, 2026Updated last month
xiquan-li / MeanAudio
View on GitHub
[ACL 2026 Main] MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows
☆142Sep 2, 2025Updated 10 months ago
XiaomiMiMo / MiMo-Audio-Tokenizer
View on GitHub
A unified tokenizer that is capable of both extracting semantic information and enabling high-fidelity audio reconstruction.
☆145Sep 19, 2025Updated 10 months ago
k2-fsa / Flow2GAN
View on GitHub
Hybrid Flow Matching and GAN with Multi-Resolution Network for Few-Step High-Fidelity Audio Generation
☆145Mar 8, 2026Updated 4 months ago
HeCheng0625 / Diffusion-Speech-Tokenizer
View on GitHub
This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaD…
☆198Jan 25, 2026Updated 6 months ago
OpenMOSS / MOSS-Audio-Tokenizer
View on GitHub
MOSS-Audio-Tokenizer is a Causal Transformer-based audio tokenizer built on the CAT architecture. Trained on 3M hours of diverse audio, i…
☆248Jun 16, 2026Updated last month
Clovermax / AED-TSVAD
View on GitHub
Attention-Based Encoder-Decoder Target-Speaker Voice Activity Detection for Robust Speaker Diarization
☆31Sep 22, 2025Updated 10 months ago
hyyan2k / LiSenNet
View on GitHub
This is the official implementation of the LiSenNet
☆162Nov 15, 2024Updated last year
Jokejiangv / LABNet
View on GitHub
The code about “LABNet: A Lightweight Attentive Beamforming Network for Ad-hoc Multichannel Microphone Invariant Real-Time Speech Enhance…
☆49Oct 10, 2025Updated 9 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
Xiaobin-Rong / ul-unas
View on GitHub
The official repo of UL-UNAS, an ultra-lightweight SE model.
☆192Jun 17, 2026Updated last month
ASLP-lab / VoiceSculptor
View on GitHub
An instruct text-to-speech solution based on LLaSA and CosyVoice2 developed by the ASLP lab and collaborators.
☆250Feb 26, 2026Updated 4 months ago
wsntxxn / UniFlow-Audio
View on GitHub
☆73Jul 17, 2026Updated last week
Xiaobin-Rong / gtcrn
View on GitHub
The official implementation of GTCRN, an ultra-lightweight SE model.
☆695Jan 18, 2026Updated 6 months ago
gyt1145028706 / XY-Tokenizer
View on GitHub
This is the code for paper: XY-Tokenizer: Mitigating the Semantic-Acoustic Conflict in Low-Bitrate Speech Codecs
☆97Sep 19, 2025Updated 10 months ago
wenet-e2e / west
View on GitHub
We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction
☆206Jul 17, 2026Updated last week
XZWY / SpatialCodec
View on GitHub
Implementation of SpatialCodec.
☆71Sep 23, 2023Updated 2 years ago
aask1357 / fastenhancer
View on GitHub
Speed-optimized streaming neural speech enhancement network
☆136Jul 3, 2026Updated 3 weeks ago
kandinskylab / kvae-audio
View on GitHub
KVAE-Audio: a continuous full-band audio waveform autoencoder
☆101Updated this week
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
facebookresearch / FlowDec
View on GitHub
An neural full-band audio codec for general audio sampled at 48 kHz with 7.5 kps or 4.5 kbps.
☆212Jun 22, 2026Updated last month
xiaomi-research / tts-prism
View on GitHub
☆47Apr 27, 2026Updated 2 months ago
Ereboas / MagiCodec
View on GitHub
A single-layer, streaming codec model providing SOTA audio quality and discrete tokens designed for superior downstream modelability.
☆125Jun 4, 2025Updated last year
AmphionTeam / FlexiCodec
View on GitHub
[ICLR2026] FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates
☆50Jul 1, 2026Updated 3 weeks ago
yfyeung / CLSP
View on GitHub
[ACL 2026 Main] Open-Ended Speaking Style Modeling via Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training
☆104Apr 6, 2026Updated 3 months ago
ASLP-lab / MeanVC
View on GitHub
A Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows
☆298Jan 8, 2026Updated 6 months ago
Andong-Li-speech / BridgeVoC
View on GitHub
This is the repository for the work "BridgeVoC: Revitalizing Neural Vocoder from a Restoration Perspective".
☆67Nov 5, 2025Updated 8 months ago