FunAudioLLM/Fun-Audio-Chat

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/FunAudioLLM/Fun-Audio-Chat)

FunAudioLLM / Fun-Audio-Chat

Fun-Audio-Chat is a Large Audio Language Model built for natural, low-latency voice interactions.

☆973

Alternatives and similar repositories for Fun-Audio-Chat

Users that are interested in Fun-Audio-Chat are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zai-org / GLM-TTS
View on GitHub
GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning
☆1,030Apr 10, 2026Updated 2 months ago
EIT-NLP / LLaSO
View on GitHub
☆116Oct 21, 2025Updated 8 months ago
stepfun-ai / Step-Audio2
View on GitHub
Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation…
☆1,466Mar 16, 2026Updated 3 months ago
XiaomiMiMo / MiMo-Audio
View on GitHub
MiMo-Audio: Audio Language Models are Few-Shot Learners
☆1,056Jun 17, 2026Updated 2 weeks ago
KexinHUANG19 / InstructTTSEval
View on GitHub
☆48Jun 25, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
X-LANCE / SLAM-LLM
View on GitHub
A Framework for Speech, Language, Audio, Music Processing with Large Language Model
☆1,040Jan 15, 2026Updated 5 months ago
ddlBoJack / Awesome-Speech-Language-Model
View on GitHub
Paper, Code and Resources for Speech Language Model and End2End Speech Dialogue System.
☆202Jun 7, 2026Updated 3 weeks ago
SJTU-OmniAgent / VocalNet
View on GitHub
☆123May 18, 2026Updated last month
ASLP-lab / MeanVC
View on GitHub
A Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows
☆288Jan 8, 2026Updated 5 months ago
xingchensong / FlashCosyVoice
View on GitHub
FlashCosyVoice: A lightweight vLLM implementation built from scratch for CosyVoice.
☆251Feb 25, 2026Updated 4 months ago
Tencent / Covo-Audio
View on GitHub
Covo-Audio is a 7B-parameter end-to-end large audio language model that directly processes continuous audio inputs and generates audio ou…
☆172Mar 17, 2026Updated 3 months ago
xcc-zach / xtalk
View on GitHub
X-Talk is an open-source full-duplex cascaded spoken dialogue system framework enabling low-latency, interruptible, and human-like speech…
☆216Updated this week
Soul-AILab / SoulX-Duplug
View on GitHub
Plug-and-play streaming semantic VAD for real-time full-duplex spoken dialogue systems.
☆259Mar 20, 2026Updated 3 months ago
yynil / RWKVTTS
View on GitHub
This project is to train an RWKV LLM for TTS generation which compatible to other TTS engine(like fish/cosy/chattts).
☆99Oct 8, 2025Updated 8 months ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
alibaba / vstyle
View on GitHub
☆33Sep 15, 2025Updated 9 months ago
roudimit / Omni-R1
View on GitHub
[ASRU 2025] Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?
☆47Nov 21, 2025Updated 7 months ago
xingchensong / TouchNet
View on GitHub
A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp.
☆230Updated this week
xiquan-li / MeanAudio
View on GitHub
[ACL 2026 Main] MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows
☆140Sep 2, 2025Updated 10 months ago
modelscope / FunCodec
View on GitHub
FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music gener…
☆443Jan 25, 2024Updated 2 years ago
vivian556123 / NeurIPS2024-CoVoMix
View on GitHub
Official repo for CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
☆67Jan 16, 2025Updated last year
FireRedTeam / FireRedTTS2
View on GitHub
Long-form streaming TTS system for multi-speaker dialogue generation
☆1,409Oct 26, 2025Updated 8 months ago
inclusionAI / Ming-omni-tts
View on GitHub
Ming-omni-tts: Simple and Efficient Unified Generation of Speech, Music, and Sound with Precise Control
☆259Feb 26, 2026Updated 4 months ago
NKU-HLT / DIFFA
View on GitHub
[AAAI 2026 & ACL 2026] The official implementation of the DIFFA series for dLLM-based large audio language model
☆82Apr 7, 2026Updated 2 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
OpenMOSS / MOSS-TTSD
View on GitHub
MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis. It features long-context modeling, flex…
☆1,355Mar 23, 2026Updated 3 months ago
xingchensong / S3Tokenizer
View on GitHub
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
☆513Dec 22, 2025Updated 6 months ago
ASLP-lab / OSUM
View on GitHub
OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.
☆494Nov 23, 2025Updated 7 months ago
meituan-longcat / LongCat-Audio-Codec
View on GitHub
LongCat Audio Tokenizer and Detokenizer
☆303May 9, 2026Updated last month
MoonshotAI / Kimi-Audio
View on GitHub
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
☆4,658Jun 21, 2025Updated last year
the-bird-F / GLM-Voice-RAG
View on GitHub
[EMNLP 2025 Findings] A complete cross-modal RAG system for end-to-end speech-to-speech large models, including ASR-based Retrieval and E…
☆31Jul 11, 2025Updated 11 months ago
ASLP-lab / VoiceSculptor
View on GitHub
An instruct text-to-speech solution based on LLaSA and CosyVoice2 developed by the ASLP lab and collaborators.
☆251Feb 26, 2026Updated 4 months ago
DanielLin94144 / Full-Duplex-Bench
View on GitHub
A Benchmark for Evaluating Turn-Taking and Overlap Handling in Full-Duplex Spoken Dialogue Models
☆213May 20, 2026Updated last month
GAIR-NLP / LiveTalk
View on GitHub
☆320Jan 2, 2026Updated 6 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
MatthewCYM / VoiceBench
View on GitHub
[TACL'26] VoiceBench: Benchmarking LLM-Based Voice Assistants
☆376Jun 11, 2026Updated 3 weeks ago
byteresearchcla / RealSI
View on GitHub
RealSI: Open Benchmark for Simultaneous Interpretation in Real-world Scenarios
☆82Jul 4, 2025Updated last year
FunAudioLLM / Fun-ASR
View on GitHub
Fun-ASR-Nano LLM-ASR model: 31 languages, dialects, accents, lyrics, hotwords, timestamps, and speaker diarization.
☆1,331Updated this week
MRSAudio / MRSAudio_Main
View on GitHub
MRSAudio: A Large-Scale Multimodal Recorded Spatial Audio Dataset with Refined Annotations
☆41Oct 15, 2025Updated 8 months ago
zai-org / GLM-4-Voice
View on GitHub
GLM-4-Voice | 端到端中英语音对话模型
☆3,202Dec 5, 2024Updated last year
the-bird-F / Expressive-Vectors
View on GitHub
[ICASSP 2026] Task Vector in TTS: Toward Emotionally Expressive Dialectal Speech Synthesis
☆40Dec 24, 2025Updated 6 months ago
Ruiqi-Yan / Awesome-Full-Duplex-SDM
View on GitHub
A curated list of full-duplex spoken dialogue models & benchmarks
☆105Jun 25, 2026Updated last week