FunAudioLLM/Fun-ASR

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/FunAudioLLM/Fun-ASR)

FunAudioLLM / Fun-ASR

Fun-ASR-Nano LLM-ASR model: 31 languages, dialects, accents, lyrics, hotwords, timestamps, and speaker diarization.

☆1,331

Alternatives and similar repositories for Fun-ASR

Users that are interested in Fun-ASR are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yfyeung / CLSP
View on GitHub
[ACL 2026 Main] Open-Ended Speaking Style Modeling via Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training
☆104Apr 6, 2026Updated 2 months ago
pengzhendong / asr-decoder
View on GitHub
CTC decoder with hotwords for ASR.
☆36Jun 15, 2026Updated 2 weeks ago
yuekaizhang / Fun-ASR-vllm
View on GitHub
Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.
☆102May 26, 2026Updated last month
zai-org / GLM-ASR
View on GitHub
GLM-ASR-Nano: A robust, open-source speech recognition model with 1.5B parameters
☆819Mar 6, 2026Updated 3 months ago
xiquan-li / MeanAudio
View on GitHub
[ACL 2026 Main] MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows
☆140Sep 2, 2025Updated 10 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
FireRedTeam / FireRedASR
View on GitHub
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR be…
☆1,922Feb 25, 2026Updated 4 months ago
QwenLM / Qwen3-ASR
View on GitHub
Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music…
☆2,998Jun 26, 2026Updated last week
stepfun-ai / Step-Audio2
View on GitHub
Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation…
☆1,466Mar 16, 2026Updated 3 months ago
xcc-zach / xtalk
View on GitHub
X-Talk is an open-source full-duplex cascaded spoken dialogue system framework enabling low-latency, interruptible, and human-like speech…
☆216Updated this week
FireRedTeam / FireRedASR2S
View on GitHub
A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/…
☆572Jun 2, 2026Updated last month
xingchensong / TouchNet
View on GitHub
A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp.
☆230Updated this week
fengin / Fun-ASR-Nano-2512-Deploy
View on GitHub
Fun-ASR-Nano-2512官方发布的仓库内容有点多，部署起来坑也比较多，本项目提供一个简化的部署方案。
☆150Dec 26, 2025Updated 6 months ago
Audio-Reasoning-Challenge / Audio-Reasoning-Challenge-Baselines
View on GitHub
The baselines of ARC-Challenge-Interspeech2026
☆60Dec 1, 2025Updated 7 months ago
FunAudioLLM / SenseVoice
View on GitHub
Multilingual speech understanding: ASR + emotion recognition + audio event detection. 50+ languages, 15x faster than Whisper, non-autoreg…
☆8,713Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ictnlp / SLED-TTS
View on GitHub
Streamable Text-to-Speech model using a language modeling approach, without vector quantization
☆109May 20, 2025Updated last year
modelscope / FunASR
View on GitHub
Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-…
☆18,699Updated this week
k2-fsa / ZipVoice
View on GitHub
Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
☆994Dec 2, 2025Updated 7 months ago
FireRedTeam / FireRedVAD
View on GitHub
A SOTA Industrial-Grade Voice Activity Detection & Audio Event Detection, supporting 100+ languages, outperforming Silero-VAD, TEN-VAD, F…
☆439May 6, 2026Updated last month
FunAudioLLM / Fun-Audio-Chat
View on GitHub
Fun-Audio-Chat is a Large Audio Language Model built for natural, low-latency voice interactions.
☆973Feb 27, 2026Updated 4 months ago
pengzhendong / streaming-asr
View on GitHub
One command to start a streaming ASR server.
☆12Oct 2, 2024Updated last year
boson-ai / EmergentTTS-Eval-public
View on GitHub
[NeurIPS' 25] Benchmark for evaluating TTS models on complex prosodic, expressiveness, and linguistic challenges.
☆224Dec 9, 2025Updated 6 months ago
IDEA-Emdoor-Lab / UniTTS
View on GitHub
A TTS Trained on Universal Audio.
☆41Jun 6, 2025Updated last year
k2-fsa / Flow2GAN
View on GitHub
Hybrid Flow Matching and GAN with Multi-Resolution Network for Few-Step High-Fidelity Audio Generation
☆143Mar 8, 2026Updated 3 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
VITA-MLLM / VITA-Audio
View on GitHub
✨✨[NeurIPS 2025] VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
☆681May 24, 2025Updated last year
QuadraV-Speech / funasr_seaco_paraformer_onnx_with_timestamp
View on GitHub
修复funasr中seaco-paraformer导出onnx后没有时间戳的bug
☆25Sep 12, 2024Updated last year
AmphionTeam / SpeechJudge
View on GitHub
SpeechJudge: Towards Human-Level Judgment for Speech Naturalness (https://arxiv.org/abs/2511.07931)
☆75Dec 23, 2025Updated 6 months ago
HaujetZhao / Qwen3-ASR-GGUF
View on GitHub
将 Qwen3-ASR 的 LLM 部分导出为 GGUF，用 llama.cpp 进行加速推理。后者支持 Vulkan 和 Cuda 加速。
☆183Apr 29, 2026Updated 2 months ago
DataoceanAI / Dolphin
View on GitHub
Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University.
☆765Jun 11, 2026Updated 3 weeks ago
Soul-AILab / SoulX-Duplug
View on GitHub
Plug-and-play streaming semantic VAD for real-time full-duplex spoken dialogue systems.
☆259Mar 20, 2026Updated 3 months ago
X-LANCE / SLAM-LLM
View on GitHub
A Framework for Speech, Language, Audio, Music Processing with Large Language Model
☆1,040Jan 15, 2026Updated 5 months ago
FunAudioLLM / CosyVoice
View on GitHub
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
☆21,878May 25, 2026Updated last month
TEN-framework / ten-vad
View on GitHub
Voice Activity Detector (VAD) : low-latency, high-performance and lightweight
☆2,173Feb 2, 2026Updated 5 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
maitrix-org / Voila
View on GitHub
☆493May 6, 2025Updated last year
lovemefan / SenseVoice.cpp
View on GitHub
Port of Funasr's Sense-voice model in C/C++
☆561Dec 19, 2025Updated 6 months ago
Mddct / cosyvoice2-flow-optimized
View on GitHub
faster inference
☆28Jan 20, 2025Updated last year
k2-fsa / icefall
View on GitHub
☆1,436Updated this week
zai-org / GLM-TTS
View on GitHub
GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning
☆1,030Apr 10, 2026Updated 2 months ago
vivian556123 / NeurIPS2024-CoVoMix
View on GitHub
Official repo for CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
☆67Jan 16, 2025Updated last year
FireRedTeam / FireRedChat
View on GitHub
A Fully Self-Hosted Solution for Full-Duplex Voice Interaction
☆549Sep 28, 2025Updated 9 months ago