zai-org/GLM-ASR

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zai-org/GLM-ASR)

zai-org / GLM-ASR

GLM-ASR-Nano: A robust, open-source speech recognition model with 1.5B parameters

☆836

Alternatives and similar repositories for GLM-ASR

Users that are interested in GLM-ASR are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zai-org / GLM-TTS
View on GitHub
GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning
☆1,044Apr 10, 2026Updated 3 months ago
QwenAudio / Fun-ASR
View on GitHub
Open-source LLM-based ASR model family for Chinese, dialect, accent, and multilingual speech, with FunASR, vLLM, streaming, and llama.cpp…
☆1,425Updated this week
FireRedTeam / FireRedASR2S
View on GitHub
A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/…
☆614Jun 2, 2026Updated last month
QwenLM / Qwen3-ASR
View on GitHub
Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music…
☆3,214Jun 26, 2026Updated 3 weeks ago
wenet-e2e / west
View on GitHub
We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction
☆206Jul 17, 2026Updated last week
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
QwenAudio / Fun-Audio-Chat
View on GitHub
Fun-Audio-Chat is a Large Audio Language Model built for natural, low-latency voice interactions.
☆985Feb 27, 2026Updated 4 months ago
FireRedTeam / FireRedASR
View on GitHub
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR be…
☆1,940Feb 25, 2026Updated 5 months ago
XiaomiMiMo / MiMo-V2.5-ASR
View on GitHub
Robust Speech Recognition Across Languages, Dialects, and Complex Acoustic Scenarios
☆317Apr 23, 2026Updated 3 months ago
XiaomiMiMo / MiMo-Audio
View on GitHub
MiMo-Audio: Audio Language Models are Few-Shot Learners
☆1,066Jun 17, 2026Updated last month
inclusionAI / Ming-UniAudio
View on GitHub
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
☆450Nov 27, 2025Updated 7 months ago
stepfun-ai / Step-Audio-R1
View on GitHub
☆690Apr 29, 2026Updated 2 months ago
xzf-thu / Voices-in-the-Wild-Bench
View on GitHub
☆28May 22, 2026Updated 2 months ago
xingchensong / FlashCosyVoice
View on GitHub
FlashCosyVoice: A lightweight vLLM implementation built from scratch for CosyVoice.
☆250Feb 25, 2026Updated 5 months ago
xiquan-li / MeanAudio
View on GitHub
[ACL 2026 Main] MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows
☆142Sep 2, 2025Updated 10 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
k2-fsa / ZipVoice
View on GitHub
Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
☆1,018Dec 2, 2025Updated 7 months ago
inclusionAI / Ming-omni-tts
View on GitHub
Ming-omni-tts: Simple and Efficient Unified Generation of Speech, Music, and Sound with Precise Control
☆263Feb 26, 2026Updated 4 months ago
ASLP-lab / VoiceSculptor
View on GitHub
An instruct text-to-speech solution based on LLaSA and CosyVoice2 developed by the ASLP lab and collaborators.
☆250Feb 26, 2026Updated 4 months ago
XiaomiMiMo / MiMo-Audio-Tokenizer
View on GitHub
A unified tokenizer that is capable of both extracting semantic information and enabling high-fidelity audio reconstruction.
☆145Sep 19, 2025Updated 10 months ago
stepfun-ai / Step-Audio2
View on GitHub
Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation…
☆1,487Mar 16, 2026Updated 4 months ago
OpenBMB / UltraEval-Audio
View on GitHub
Your faithful, impartial partner for audio evaluation — know yourself, know your rivals. 真实评测，知己知彼。A unified benchmark framework for ASR/…
☆309Updated this week
xcc-zach / xtalk
View on GitHub
X-Talk is an open-source full-duplex cascaded spoken dialogue system framework enabling low-latency, interruptible, and human-like speech…
☆231Updated this week
k2-fsa / Flow2GAN
View on GitHub
Hybrid Flow Matching and GAN with Multi-Resolution Network for Few-Step High-Fidelity Audio Generation
☆145Mar 8, 2026Updated 4 months ago
Audio-Reasoning-Challenge / Audio-Reasoning-Challenge-Baselines
View on GitHub
The baselines of ARC-Challenge-Interspeech2026
☆60Dec 1, 2025Updated 7 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Soul-AILab / SoulX-Duplug
View on GitHub
Plug-and-play streaming semantic VAD for real-time full-duplex spoken dialogue systems.
☆275Jul 17, 2026Updated last week
facebookresearch / omnilingual-asr
View on GitHub
Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages
☆2,859Dec 30, 2025Updated 6 months ago
kandinskylab / kvae-audio
View on GitHub
KVAE-Audio: a continuous full-band audio waveform autoencoder
☆101Updated this week
xingchensong / TouchNet
View on GitHub
A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp.
☆233Jul 2, 2026Updated 3 weeks ago
yuekaizhang / Fun-ASR-vllm
View on GitHub
Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.
☆107Jul 7, 2026Updated 2 weeks ago
stepfun-ai / Step-Audio-EditX
View on GitHub
A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics…
☆954Apr 9, 2026Updated 3 months ago
ASLP-lab / FlashTTS
View on GitHub
Fast Streaming TTS with MTP Acceleration and X-pred Mean Flow Distillation
☆67Jun 16, 2026Updated last month
FireRedTeam / FireRedVAD
View on GitHub
A SOTA Industrial-Grade Voice Activity Detection & Audio Event Detection, supporting 100+ languages, outperforming Silero-VAD, TEN-VAD, F…
☆472May 6, 2026Updated 2 months ago
CarlWangChina / QwenFeat-Vocal-Score
View on GitHub
VocalVerse: A powerful vocal evaluation framework powered by the Qwen LLMs
☆49May 11, 2026Updated 2 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Soul-AILab / SoulX-Transcriber
View on GitHub
An end-to-end framework for multi-speaker transcription that jointly models who spoke, when, and what.
☆284Jun 22, 2026Updated last month
pengzhendong / streaming-sensevoice
View on GitHub
Pseudo Streaming SenseVoice with Hotwords
☆466Jun 15, 2026Updated last month
yfyeung / CLSP
View on GitHub
[ACL 2026 Main] Open-Ended Speaking Style Modeling via Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training
☆104Apr 6, 2026Updated 3 months ago
ictnlp / SLED-TTS
View on GitHub
Streamable Text-to-Speech model using a language modeling approach, without vector quantization
☆108May 20, 2025Updated last year
xiaomi-research / dasheng-lm
View on GitHub
Efficient audio understanding with general audio captions
☆429Apr 24, 2026Updated 3 months ago
ASLP-lab / FastTurn
View on GitHub
☆33May 19, 2026Updated 2 months ago
hhguo / SoCodec
View on GitHub
Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications
☆92Dec 20, 2024Updated last year