mubingshen/MLC-SLM-Baseline

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/mubingshen/MLC-SLM-Baseline)

mubingshen / MLC-SLM-Baseline

The project is associated with the recently-launched INTERSPEECH 2025 Workshop on Multilingual Conversational Speech Language Model (MLC-SLM) to provide participants with baseline systems for speech recognition and speaker diarization in multilingual conversational scenario.

☆51

Alternatives and similar repositories for MLC-SLM-Baseline

Users that are interested in MLC-SLM-Baseline are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

BUTSpeechFIT / SOT-DiCoW
View on GitHub
Multi-talker ASR based on DiCoW with Serialized Output Training
☆21Sep 18, 2025Updated 10 months ago
wonjune-kang / expressive-speech-retrieval
View on GitHub
Expressive Speech Retrieval using Natural Language Descriptions of Speaking Style
☆15Aug 18, 2025Updated 11 months ago
SSTC-Challenge / SSTC2024_baseline_system
View on GitHub
☆12Jun 14, 2024Updated 2 years ago
ASLP-lab / Speaker-Reasoner
View on GitHub
Speaker-Reasoner: Scaling Interaction Turns and Reasoning Patterns for Timestamped Speaker-Attributed ASR
☆93May 13, 2026Updated 2 months ago
ASLP-lab / FMSU-Bench
View on GitHub
Towards Fine-Grained Multi-Dimensional Speech Understanding: Data Pipeline, Benchmark, and Model
☆25May 21, 2026Updated 2 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
LAION-AI / scaled-echo-tts
View on GitHub
Scaled diffusion transformer for text-to-speech synthesis (DiT + T5Gemma2 conditioning, TorchTitan & Megatron backends, tested up to 1024…
☆24Mar 29, 2026Updated 3 months ago
rithiksachdev / PostASR-Correction-SLT2024
View on GitHub
☆18Jul 22, 2024Updated 2 years ago
BUTSpeechFIT / mt-asr-data-prep
View on GitHub
☆25Feb 26, 2026Updated 5 months ago
Audio-Reasoning-Challenge / Audio-Reasoning-Challenge-Baselines
View on GitHub
The baselines of ARC-Challenge-Interspeech2026
☆60Dec 1, 2025Updated 7 months ago
BUTSpeechFIT / TS-ASR-Whisper
View on GitHub
☆116Jun 29, 2026Updated last month
ASLP-lab / SmartGlasses
View on GitHub
This challenge focuses on evaluating speech recognition and semantic understanding capabilities of AI glasses in complex real-world envir…
☆18Jun 27, 2026Updated last month
desh2608 / diarizer
View on GitHub
Clustering-based methods for overlapping diarization
☆84Jan 12, 2024Updated 2 years ago
y-ren16 / OV-InstructTTS
View on GitHub
☆22Jan 27, 2026Updated 6 months ago
TaoRuijie / MFV-KSD
View on GitHub
Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization (ACM MM 2024)
☆22Jul 25, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
merlresearch / sebbs
View on GitHub
Prediction of sound event bounding boxes (SEBBs)
☆35Aug 2, 2024Updated last year
yxduir / LLM-SRT
View on GitHub
☆28Mar 11, 2026Updated 4 months ago
MCoRec / mcorec_baseline
View on GitHub
CHiME-9 Task 1 - MCoRec baseline
☆28Jan 13, 2026Updated 6 months ago
AudenAI / Auden
View on GitHub
☆71Apr 2, 2026Updated 3 months ago
teamtee / Qwen2-Audio-finetune
View on GitHub
This is a repository for fine-tuning Qwen2-Audio, currently supporting Distributed Data Parallel (DDP) and DeepSpeed.
☆50Jul 28, 2025Updated last year
fgnt / speaker_reassignment
View on GitHub
Once more Diarization: Improving meeting transcription systems through segment-level speaker reassignment
☆14Feb 5, 2025Updated last year
SpeechColab / GigaSpeechBench
View on GitHub
☆29Jul 21, 2026Updated last week
Mddct / transformer-vocos
View on GitHub
☆35Sep 6, 2025Updated 10 months ago
kyegomez / AudioFlamingo
View on GitHub
Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dial…
☆39Jan 27, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
lifeiteng / NotebookTTS
View on GitHub
Text-To-Speech for NotebookLM
☆39Jul 20, 2025Updated last year
wenet-e2e / wesep
View on GitHub
Target Speaker Extraction Toolkit
☆300Oct 4, 2025Updated 9 months ago
kaistmm / seed-pytorch
View on GitHub
[INTERSPEECH 2025] Official code for "SEED: Speaker Embedding Enhancement Diffusion Model"
☆59Nov 3, 2025Updated 8 months ago
Lab-MSP / NaturalVoices
View on GitHub
☆33Oct 28, 2025Updated 9 months ago
liyunlongaaa / NSD-MS2S
View on GitHub
CHIME-7/8 diarization champion system: neural speaker diarization using memory-aware multi-speaker embedding with sequence-to-sequence ar…
☆88Jun 17, 2025Updated last year
wenet-e2e / wesr
View on GitHub
We Speech Transcript based on LLM, in 300 lines of code.
☆182Jun 20, 2025Updated last year
ASLP-lab / ArxivWatcher
View on GitHub
☆32Jun 15, 2026Updated last month
backspacetg / distilXLSR
View on GitHub
Models and codes for INTERSPEECH 2023 paper DistilXLSR: A Light Weight Cross-Lingual Speech Representation Model
☆13Mar 30, 2025Updated last year
MrSupW / ContextASR-Bench
View on GitHub
A Massive Contextual Speech Recognition Benchmark.
☆107Aug 6, 2025Updated 11 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
BUTSpeechFIT / DiCoW
View on GitHub
☆100Jan 28, 2026Updated 6 months ago
lmxue / NVV-SuperBench
View on GitHub
NVV-SuperBench: Beyond Words, Beyond Quality—Benchmarking Nonverbal Vocalizations in Speech Generation (Interspeech 2026 long paper)
☆18Jun 21, 2026Updated last month
hongfeixue / StutteringSpeechChallenge
View on GitHub
SLT 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge
☆12Jun 11, 2024Updated 2 years ago
popcornell / FastMSS
View on GitHub
☆33Updated this week
corticph / error-align
View on GitHub
Text-to-text alignment algorithm for speech recognition error analysis.
☆32Jun 23, 2026Updated last month
microsoft / NOTSOFAR1-Challenge
View on GitHub
NOTSOFAR-1 Challenge: Distant Diarization and ASR
☆65Feb 12, 2025Updated last year
ZXHY-82 / w2v-BERT-2.0_SV
View on GitHub
☆53Mar 28, 2026Updated 4 months ago