cuhealthybrains/MT-LLM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/cuhealthybrains/MT-LLM)

cuhealthybrains / MT-LLM

The implementation for "Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions"

☆51

Alternatives and similar repositories for MT-LLM

Users that are interested in MT-LLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

kjw11 / Speaker-Aware-CTC
View on GitHub
Speaker-aware CTC (SACTC) for multi-talker overlapped speech recognition.
☆22May 26, 2025Updated last year
Shy-98 / MELLE
View on GitHub
Unofficial PyTorch implementation of "Autoregressive Speech Synthesis without Vector Quantization (MELLE)"
☆41Jun 28, 2025Updated last year
LingweiMeng / Whisper-Sidecar
View on GitHub
The implementation for "Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System".
☆34Aug 2, 2025Updated 11 months ago
kjw11 / CSEnet-ASR
View on GitHub
Cross-Speaker Encoding Network for Multi-talker Speech Recognition
☆12Mar 14, 2025Updated last year
LingweiMeng / QualifyingExamPreparing
View on GitHub
Qualifying Exam Preparing
☆18May 7, 2025Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
LingweiMeng / MyChatGPT
View on GitHub
A casual and simple ChatGPT Python script that can run using terminal (as long as you have an API). Support Azure API.
☆20May 3, 2025Updated last year
HuangZiliAndy / SSL_for_multitalker
View on GitHub
ADAPTING SELF-SUPERVISED MODELS TO MULTI-TALKER SPEECH RECOGNITION USING SPEAKER EMBEDDINGS
☆33Mar 16, 2023Updated 3 years ago
rithiksachdev / PostASR-Correction-SLT2024
View on GitHub
☆18Jul 22, 2024Updated 2 years ago
Aisaka0v0 / TS-Whisper
View on GitHub
☆33Jun 12, 2025Updated last year
y-ren16 / OV-InstructTTS
View on GitHub
☆22Jan 27, 2026Updated 5 months ago
HappyColor / DrawSpeech_PyTorch
View on GitHub
☆25Nov 25, 2025Updated 8 months ago
Bartelds / ctc-dro
View on GitHub
Code associated with the paper: CTC-DRO: Robust Optimization for Reducing Language Disparities in Speech Recognition.
☆17May 16, 2025Updated last year
zxzhao0 / C2SER
View on GitHub
We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through conte…
☆49Mar 3, 2025Updated last year
nickjw0205 / Improving-ASR-with-LLM-Description
View on GitHub
☆20Sep 2, 2024Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
lifeiteng / NotebookTTS
View on GitHub
Text-To-Speech for NotebookLM
☆39Jul 20, 2025Updated last year
kuan2jiu99 / Awesome-Speech-Generation
View on GitHub
Survey on speech generation work.
☆21Nov 26, 2023Updated 2 years ago
wonjune-kang / expressive-speech-retrieval
View on GitHub
Expressive Speech Retrieval using Natural Language Descriptions of Speaking Style
☆15Aug 18, 2025Updated 11 months ago
voidful / Codec-SUPERB
View on GitHub
Audio Codec Speech processing Universal PERformance Benchmark
☆308Jul 4, 2026Updated 3 weeks ago
Audio-Foundation-Models / ConversationTTS
View on GitHub
☆101Jan 19, 2026Updated 6 months ago
lin9x / AV-Sepformer
View on GitHub
☆65Jun 28, 2023Updated 3 years ago
tzyll / ChineseHP
View on GitHub
Dataset for Pinyin Regularization in Error Correction for Chinese Speech Recognition with Large Language Models in Interspeech 2024.
☆16Jul 4, 2024Updated 2 years ago
cristinae / ASRdys
View on GitHub
ASR for dysarthric speakers with Kaldi
☆13Jan 14, 2017Updated 9 years ago
flamed-tts / Flamed-TTS
View on GitHub
This repository implement a novel zero-shot TTS framework, named Flamed-TTS, focusing on the efficient generation and dynamic pacing in …
☆57Aug 9, 2025Updated 11 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
mubingshen / MLC-SLM-Baseline
View on GitHub
The project is associated with the recently-launched INTERSPEECH 2025 Workshop on Multilingual Conversational Speech Language Model (MLC-…
☆51May 14, 2025Updated last year
Helw150 / levanter
View on GitHub
Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax
☆16Jun 16, 2024Updated 2 years ago
Aria-K-Alethia / BigCodec
View on GitHub
Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"
☆218Sep 19, 2024Updated last year
JusperLee / Gull-Codec-Training
View on GitHub
☆12Mar 11, 2025Updated last year
declare-lab / HyperTTS
View on GitHub
☆40Apr 15, 2024Updated 2 years ago
hhhaaahhhaa / ASR-TTA
View on GitHub
☆16Nov 4, 2025Updated 8 months ago
BUTSpeechFIT / SOT-DiCoW
View on GitHub
Multi-talker ASR based on DiCoW with Serialized Output Training
☆20Sep 18, 2025Updated 10 months ago
ASLP-lab / Speaker-Reasoner
View on GitHub
Speaker-Reasoner: Scaling Interaction Turns and Reasoning Patterns for Timestamped Speaker-Attributed ASR
☆93May 13, 2026Updated 2 months ago
01Zhangbw / Speech-and-audio-papers-Top-Conference
View on GitHub
☆141Jan 24, 2026Updated 6 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
NKU-HLT / SpeechLLM-as-Judges
View on GitHub
[ACL 2026]
☆25Dec 6, 2025Updated 7 months ago
FrontierLabs / F5R-TTS
View on GitHub
Official code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"
☆169Mar 3, 2026Updated 4 months ago
light1726 / Speech-Tokenization-Papers
View on GitHub
This repository follows papers and reports on discrete speech representation learning and speech tokenization methods for speech language…
☆15Dec 1, 2023Updated 2 years ago
efeslab / LiteASR
View on GitHub
[EMNLP Main '25] LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation
☆154May 18, 2025Updated last year
JusperLee / Look2hear
View on GitHub
A toolkit for researchers in the multimodal sound separation.
☆16Oct 20, 2023Updated 2 years ago
FreedomIntelligence / MTalk-Bench
View on GitHub
MTalk-Bench: Evaluating Speech-to-Speech Models in Multi-Turn Dialogues via Arena-style and Rubrics Protocols
☆20Nov 19, 2025Updated 8 months ago
facebookresearch / lst
View on GitHub
Code for Latent Speech-Text Transformer (LST)
☆35Mar 12, 2026Updated 4 months ago