yuekaizhang/Fun-ASR-vllm

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yuekaizhang/Fun-ASR-vllm)

yuekaizhang / Fun-ASR-vllm

Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.

☆108

Alternatives and similar repositories for Fun-ASR-vllm

Users that are interested in Fun-ASR-vllm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

QwenAudio / Fun-ASR
View on GitHub
Open-source LLM-based ASR model family for Chinese, dialect, accent, and multilingual speech, with FunASR, vLLM, streaming, and llama.cpp…
☆1,438Updated this week
xiaomi-research / dasheng-tokenizer
View on GitHub
State-of-the-art continious audio tokenization
☆40Mar 9, 2026Updated 4 months ago
Wasser1462 / FunASR-nano-onnx
View on GitHub
A lightweight demo of FunASR-Nano using ONNX runtime.
☆83Feb 25, 2026Updated 5 months ago
FireRedTeam / FireRedASR2S
View on GitHub
A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/…
☆619Jun 2, 2026Updated last month
QwenLM / Qwen3-ASR
View on GitHub
Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music…
☆3,239Jun 26, 2026Updated last month
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
fengin / Fun-ASR-Nano-2512-Deploy
View on GitHub
Fun-ASR-Nano-2512官方发布的仓库内容有点多，部署起来坑也比较多，本项目提供一个简化的部署方案。
☆150Dec 26, 2025Updated 7 months ago
yfyeung / CLSP
View on GitHub
[ACL 2026 Main] Open-Ended Speaking Style Modeling via Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training
☆104Apr 6, 2026Updated 3 months ago
fireredchat-submodules / livekit-plugins-fireredchat-pvad
View on GitHub
FireRedChat pVAD plugin for LiveKit Agents
☆22Sep 16, 2025Updated 10 months ago
X-LANCE / Xmart
View on GitHub
Xmart青年论坛仓库，存放历史学生论坛和前沿讲座的视频回放和讲义，获取最新Xmart预告欢迎关注公众号【XLANCE Lab】
☆54Apr 7, 2026Updated 3 months ago
xingchensong / FlashCosyVoice
View on GitHub
FlashCosyVoice: A lightweight vLLM implementation built from scratch for CosyVoice.
☆250Feb 25, 2026Updated 5 months ago
yfyeung / DS-WED
View on GitHub
[ICASSP 2026] Official code for "Measuring Prosody Diversity in Zero-Shot TTS: A New Metric, Benchmark, and Exploration"
☆17Apr 16, 2026Updated 3 months ago
Gilgamesh-J / X-ASR
View on GitHub
X-ASR is a series of automatic speech recognition models based on the icefall framework, focusing on streaming ASR and low-latency deploy…
☆148Jul 8, 2026Updated 3 weeks ago
Soul-AILab / SoulX-Duplug
View on GitHub
Plug-and-play streaming semantic VAD for real-time full-duplex spoken dialogue systems.
☆278Jul 17, 2026Updated last week
pengzhendong / ngram-punctuator
View on GitHub
An N-gram punctuator for Chinese and English.
☆18Oct 14, 2025Updated 9 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
xiaomi-research / acavcaps
View on GitHub
☆31Mar 27, 2026Updated 4 months ago
Soul-AILab / SoulX-Transcriber
View on GitHub
An end-to-end framework for multi-speaker transcription that jointly models who spoke, when, and what.
☆284Jun 22, 2026Updated last month
Soul-AILab / SoulX-Singer-Eval
View on GitHub
A Benchmark and Evaluation Suite for Zero-shot Singing Voice Synthesis
☆33Feb 11, 2026Updated 5 months ago
HaujetZhao / asr-hotword
View on GitHub
最棒的的ASR后处理热词方案，基于音素编辑距离，实现热词替换。
☆45Jun 10, 2026Updated last month
pengzhendong / speaker-diarization
View on GitHub
Offline Speaker Diarization with SenseVoice by Sherpa ONNX.
☆15Dec 23, 2024Updated last year
qi-hua / async_cosyvoice
View on GitHub
使用vllm加速cosyvoice2的推理
☆498Apr 26, 2025Updated last year
pengzhendong / compute-wer
View on GitHub
Compute WER and SER for speech recognition evaluation
☆27Jun 6, 2026Updated last month
pengzhendong / streaming-sensevoice
View on GitHub
Pseudo Streaming SenseVoice with Hotwords
☆467Jun 15, 2026Updated last month
zhaoyx239 / X-Translator
View on GitHub
☆26Jul 21, 2026Updated last week
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
jingzhunxue / FlowMirror_HydraVox
View on GitHub
FlowMirror-HydraVox — A natively accelerated multi-head autoregressive TTS system derived from CosyVoice 3.0. It predicts multiple tokens…
☆49Feb 17, 2026Updated 5 months ago
xiaomi-research / dasheng-audiogen
View on GitHub
end-to-end text to audio scene generation model
☆50Jun 16, 2026Updated last month
OpenMOSS / MOSS-Speech
View on GitHub
MOSS-Speech is a true speech-to-speech large language model without text guidance.
☆139Feb 13, 2026Updated 5 months ago
ASLP-lab / M7-TTS
View on GitHub
M7-TTS: A Mini-Scale Multilingual and Multi-Dialect Text-to-Speech Language Model with Mimi codec and Multi Token Prediction
☆20Mar 19, 2026Updated 4 months ago
jeremychee4 / AffectSpeech
View on GitHub
AffectSpeech: A Large-Scale Emotional Speech Dataset with Fine-Grained Textual Descriptions for Speech Emotion Captioning and Synthesis
☆68Jun 12, 2026Updated last month
Quantatirsk / qwen3-asr
View on GitHub
All in one Qwen3-ASR Server, compatible with OpenAI API
☆322Jul 14, 2026Updated 2 weeks ago
pengzhendong / torchfa
View on GitHub
Torch Audio Forced Aligner for Mixed Chinese (Mandarin or Cantonese) and English.
☆61Sep 5, 2025Updated 10 months ago
Alittleegg / Eureka-Audio
View on GitHub
Eureka-Audio: A 1.7B lightweight audio–language model that matches 7B–30B models on ASR, audio understanding, and paralinguistic reasonin…
☆40Apr 11, 2026Updated 3 months ago
QwenAudio / Fun-Audio-Chat
View on GitHub
Fun-Audio-Chat is a Large Audio Language Model built for natural, low-latency voice interactions.
☆985Feb 27, 2026Updated 5 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
xiquan-li / Resonate
View on GitHub
[INTERSPEECH 2026] Pre-training, SFT, DPO and GRPO for Text-to-Audio Generation
☆48Apr 17, 2026Updated 3 months ago
zai-org / GLM-ASR
View on GitHub
GLM-ASR-Nano: A robust, open-source speech recognition model with 1.5B parameters
☆835Mar 6, 2026Updated 4 months ago
leospark / FireRedVAD-Engineering
View on GitHub
Lightweight streaming Voice Activity Detection (VAD) tool with ONNX runtime
☆24Mar 18, 2026Updated 4 months ago
Mddct / cosyvoice2-flow-optimized
View on GitHub
faster inference
☆27Jan 20, 2025Updated last year
pengzhendong / audiolab
View on GitHub
A streaming audio reader, processor, and writer built on top of soundfile, and PyAV (bindings for FFmpeg)
☆39Mar 31, 2026Updated 3 months ago
xiaomi-research / tts-prism
View on GitHub
☆47Apr 27, 2026Updated 3 months ago
Mddct / transformer-vocos
View on GitHub
☆35Sep 6, 2025Updated 10 months ago