yxduir/LLM-SRT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yxduir/LLM-SRT)

yxduir / LLM-SRT

☆28

Alternatives and similar repositories for LLM-SRT

Users that are interested in LLM-SRT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yxduir / m2m-70
View on GitHub
☆18Jun 25, 2026Updated last month
RyanLoil / ProjectEval
View on GitHub
☆29Jul 3, 2026Updated 3 weeks ago
llm-jp / llama-mimi
View on GitHub
Llama-Mimi is a speech language model that uses a unified tokenizer (Mimi) and a single Transformer decoder (Llama) to jointly model sequ…
☆31Sep 20, 2025Updated 10 months ago
BUTSpeechFIT / SOT-DiCoW
View on GitHub
Multi-talker ASR based on DiCoW with Serialized Output Training
☆21Sep 18, 2025Updated 10 months ago
nickjw0205 / Improving-ASR-with-LLM-Description
View on GitHub
☆20Sep 2, 2024Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
mubingshen / MLC-SLM-Baseline
View on GitHub
The project is associated with the recently-launched INTERSPEECH 2025 Workshop on Multilingual Conversational Speech Language Model (MLC-…
☆51May 14, 2025Updated last year
facebookresearch / lst
View on GitHub
Code for Latent Speech-Text Transformer (LST)
☆35Mar 12, 2026Updated 4 months ago
introlab / uimvdr
View on GitHub
☆13Oct 11, 2024Updated last year
nethermanpro / ComSL
View on GitHub
☆11Oct 14, 2023Updated 2 years ago
Exgc / AVMuST-TED
View on GitHub
☆24Mar 30, 2024Updated 2 years ago
mathllm / VoiceAssistant-Eval
View on GitHub
A rigorous framework for evaluating and guiding the development of next-generation AI assistants.
☆19Jan 26, 2026Updated 6 months ago
SakanaAI / kame_finetune
View on GitHub
☆30Jul 16, 2026Updated 2 weeks ago
wonjune-kang / expressive-speech-retrieval
View on GitHub
Expressive Speech Retrieval using Natural Language Descriptions of Speaking Style
☆15Aug 18, 2025Updated 11 months ago
rithiksachdev / PostASR-Correction-SLT2024
View on GitHub
☆18Jul 22, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Observeai-Research / Phoneme-BERT
View on GitHub
☆34Jun 15, 2021Updated 5 years ago
audiosae / audio-sae
View on GitHub
Demo for AudioSAE paper
☆15Apr 26, 2026Updated 3 months ago
Tencent / StableToken
View on GitHub
[ICLR 2026] StableToken: A state-of-the-art noise-robust semantic speech tokenizer featuring Voting-LFQ for resilient SpeechLLMs.
☆33Feb 27, 2026Updated 5 months ago
IiuZiKai / Evo_TSE
View on GitHub
☆17Apr 9, 2026Updated 3 months ago
lavendery / UUG
View on GitHub
☆21Sep 14, 2025Updated 10 months ago
exercise-book-yq / Supercodec
View on GitHub
☆51Mar 5, 2026Updated 4 months ago
umbertocappellazzo / Omni-AVSR
View on GitHub
Official Pytorch implementation of "Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models" [IEEE ICASSP 202…
☆38Mar 10, 2026Updated 4 months ago
FreedomIntelligence / S2S-Arena
View on GitHub
☆21Jun 4, 2026Updated last month
zjwang21 / mix-phoneme-bert
View on GitHub
An unofficial PyTorch implementation of Mix-Phoneme-Bert
☆40Jul 10, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ajaybati / miipher2.0
View on GitHub
Reimplementation of Miipher
☆30Aug 16, 2023Updated 2 years ago
choijeongsoo / utut
View on GitHub
[TASLP 2024] Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation
☆31Sep 6, 2024Updated last year
arxrean / LipRead-seq2seq
View on GitHub
An unofficial (PyTorch) implementation for the paper Deep Lip Reading: A comparison of models and an online application.
☆10May 13, 2020Updated 6 years ago
line / WaveTrainerFit
View on GitHub
Official implementation of "Wave-Trainer-Fit: Neural Vocoder with Trainable Prior and Fixed-Point Iteration towards High-Quality Speech G…
☆16Feb 6, 2026Updated 5 months ago
ufal / augpt
View on GitHub
DSTC9 Submission
☆16Apr 12, 2021Updated 5 years ago
ZhangXinWhut / SimWhisper-Codec
View on GitHub
Official code for paper:"Speaking Clearly: A Simplified Whisper-Based Codec for Low-Bitrate Speech Coding"
☆37Jan 28, 2026Updated 6 months ago
Mddct / transformer-vocos
View on GitHub
☆35Sep 6, 2025Updated 10 months ago
LingweiMeng / Whisper-Sidecar
View on GitHub
The implementation for "Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System".
☆34Aug 2, 2025Updated 11 months ago
cuhealthybrains / MT-LLM
View on GitHub
The implementation for "Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions"
☆51Apr 7, 2025Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
h-munakata / Lighthouse-Wrapper-for-Audio-Moment-Retrieval
View on GitHub
☆13Mar 23, 2026Updated 4 months ago
lysanderism / TimeAudio
View on GitHub
The official repository TimeAudio, a comprehensive framework that incorporates fine-grained acoustic cues into LALMs with enhanced module…
☆30Nov 18, 2025Updated 8 months ago
kjw11 / CSEnet-ASR
View on GitHub
Cross-Speaker Encoding Network for Multi-talker Speech Recognition
☆12Mar 14, 2025Updated last year
vivian556123 / NeurIPS2024-CoVoMix
View on GitHub
Official repo for CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
☆67Jan 16, 2025Updated last year
LeiLiLab / InfiniSST
View on GitHub
☆25May 27, 2026Updated 2 months ago
slSeanWU / beats-conformer-bart-audio-captioner
View on GitHub
PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Superv…
☆41Jan 6, 2024Updated 2 years ago
corticph / error-align
View on GitHub
Text-to-text alignment algorithm for speech recognition error analysis.
☆32Jun 23, 2026Updated last month