ASLP-lab/OSUM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ASLP-lab/OSUM)

ASLP-lab / OSUM

OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.

☆494

Alternatives and similar repositories for OSUM

Users that are interested in OSUM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ASLP-lab / Speaker-Reasoner
View on GitHub
Speaker-Reasoner: Scaling Interaction Turns and Reasoning Patterns for Timestamped Speaker-Attributed ASR
☆93May 13, 2026Updated 2 months ago
MrSupW / ContextASR-Bench
View on GitHub
A Massive Contextual Speech Recognition Benchmark.
☆107Aug 6, 2025Updated 11 months ago
ASLP-lab / MINT-Bench
View on GitHub
☆48May 2, 2026Updated 2 months ago
ASLP-lab / WenetSpeech-Wu-Repo
View on GitHub
A Large-scale Wu Dialect Speech Corpus with Multi-dimensional Annotations
☆170Feb 6, 2026Updated 5 months ago
ASLP-lab / MeanVC
View on GitHub
A Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows
☆296Jan 8, 2026Updated 6 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
wenet-e2e / west
View on GitHub
We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction
☆206Updated this week
ASLP-lab / VoiceSculptor
View on GitHub
An instruct text-to-speech solution based on LLaSA and CosyVoice2 developed by the ASLP lab and collaborators.
☆250Feb 26, 2026Updated 4 months ago
qualialabsAI / SmoothConv-DuplexConv
View on GitHub
☆82Jun 12, 2026Updated last month
ASLP-lab / ArxivWatcher
View on GitHub
☆31Jun 15, 2026Updated last month
Soul-AILab / SoulX-Transcriber
View on GitHub
An end-to-end framework for multi-speaker transcription that jointly models who spoke, when, and what.
☆282Jun 22, 2026Updated 3 weeks ago
xingchensong / TouchNet
View on GitHub
A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp.
☆232Jul 2, 2026Updated 2 weeks ago
ASLP-lab / M7-TTS
View on GitHub
M7-TTS: A Mini-Scale Multilingual and Multi-Dialect Text-to-Speech Language Model with Mimi codec and Multi Token Prediction
☆20Mar 19, 2026Updated 4 months ago
ASLP-lab / C2SER
View on GitHub
We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through conte…
☆17Mar 3, 2025Updated last year
ASLP-lab / DiffRhythm2
View on GitHub
Di♪♪Rhythm 2: Efficient And High Fidelity Song Generation Via Block Flow Matching
☆166Nov 9, 2025Updated 8 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ASLP-lab / WenetSpeech-Yue
View on GitHub
A Large-scale Cantonese Speech Corpus with Multi-dimensional Annotation
☆341Jun 6, 2026Updated last month
ASLP-lab / FMSU
View on GitHub
Towards Fine-Grained Multi-Dimensional Speech Understanding: Data Pipeline, Benchmark, and Model
☆25May 21, 2026Updated 2 months ago
ASLP-lab / WenetSpeech-Chuan
View on GitHub
Official repository for the WenetSpeech-Chuan dataset.
☆218Jul 14, 2026Updated last week
xingchensong / S3Tokenizer
View on GitHub
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
☆517Dec 22, 2025Updated 6 months ago
ASLP-lab / LLaSA_Plus
View on GitHub
Llasa Speed Up
☆64Jan 18, 2026Updated 6 months ago
disco-speech / DisCo-Speech
View on GitHub
☆90Dec 31, 2025Updated 6 months ago
xingchensong / FlashCosyVoice
View on GitHub
FlashCosyVoice: A lightweight vLLM implementation built from scratch for CosyVoice.
☆250Feb 25, 2026Updated 4 months ago
jishengpeng / WavChat
View on GitHub
A Survey of Spoken Dialogue Models (60 pages)
☆316Nov 28, 2024Updated last year
X-LANCE / SLAM-LLM
View on GitHub
A Framework for Speech, Language, Audio, Music Processing with Large Language Model
☆1,048Jan 15, 2026Updated 6 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ASLP-lab / SongEval
View on GitHub
A song aesthetic evaluation toolkit trained on SongEval.
☆314Apr 8, 2026Updated 3 months ago
VITA-MLLM / Freeze-Omni
View on GitHub
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
☆388May 27, 2025Updated last year
ASLP-lab / OmniCodec
View on GitHub
OmniCodec: Low Frame Rate Universal Audio Codec with Semantic–Acoustic Disentanglement
☆46Apr 17, 2026Updated 3 months ago
ASLP-lab / LLaSE-G1
View on GitHub
LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement
☆47Mar 10, 2025Updated last year
baichuan-inc / Baichuan-Audio
View on GitHub
Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction
☆223Feb 28, 2025Updated last year
ASLP-lab / OSUM-Pangu
View on GitHub
An Open-Source Multidimension Speech Understanding Foundation Model Built upon OpenPangu on Ascend NPUs
☆33Mar 15, 2026Updated 4 months ago
FireRedTeam / FireRedASR
View on GitHub
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR be…
☆1,937Feb 25, 2026Updated 4 months ago
thuhcsi / SpeechCraft
View on GitHub
The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.
☆197Feb 28, 2026Updated 4 months ago
ASLP-lab / Hum-Dial
View on GitHub
ICASSP2026 HumDial Challenge
☆50May 28, 2026Updated last month
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
AmphionTeam / Emilia-NV
View on GitHub
Official Repository of Paper: "Emilia-NV: A Non-Verbal Speech Dataset with Word-Level Annotation for Human-Like Speech Modeling"
☆91Sep 18, 2025Updated 10 months ago
ASLP-lab / SmartGlasses
View on GitHub
This challenge focuses on evaluating speech recognition and semantic understanding capabilities of AI glasses in complex real-world envir…
☆18Jun 27, 2026Updated 3 weeks ago
ASLP-lab / SenSE
View on GitHub
Official code of SenSE.
☆90Oct 30, 2025Updated 8 months ago
gyt1145028706 / XY-Tokenizer
View on GitHub
This is the code for paper: XY-Tokenizer: Mitigating the Semantic-Acoustic Conflict in Low-Bitrate Speech Codecs
☆96Sep 19, 2025Updated 10 months ago
lmxue / Audio-FLAN
View on GitHub
Audio-FLAN
☆161Sep 23, 2025Updated 9 months ago
pengzhendong / audiolab
View on GitHub
A streaming audio reader, processor, and writer built on top of soundfile, and PyAV (bindings for FFmpeg)
☆39Mar 31, 2026Updated 3 months ago
Kevin-naticl / LLaSE-G1
View on GitHub
LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement
☆105Apr 1, 2025Updated last year