skit-ai/SpeechLLM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/skit-ai/SpeechLLM)

skit-ai / SpeechLLM

This repository contains the training, inference, evaluation code for SpeechLLM models and details about the model releases on huggingface.

☆137

Alternatives and similar repositories for SpeechLLM

Users that are interested in SpeechLLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

X-LANCE / SLAM-LLM
View on GitHub
A Framework for Speech, Language, Audio, Music Processing with Large Language Model
☆1,048Jan 15, 2026Updated 6 months ago
wenet-e2e / wesr
View on GitHub
We Speech Transcript based on LLM, in 300 lines of code.
☆182Jun 20, 2025Updated last year
snsun / kaldi-decoder-code-reading
View on GitHub
☆33Oct 28, 2022Updated 3 years ago
nguyenvulebinh / AVSRCocktail
View on GitHub
Audio-Visual Speech Recognition
☆26Jul 7, 2025Updated last year
pengzhendong / ngram-punctuator
View on GitHub
An N-gram punctuator for Chinese and English.
☆18Oct 14, 2025Updated 9 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ga642381 / speech-trident
View on GitHub
Awesome speech/audio LLMs, representation learning, and codec models
☆1,239Jul 10, 2026Updated last week
WWWWxp / Speech-Tokenizer-Papers
View on GitHub
This repository collects papers related to Speech Tokenizer.
☆18Oct 16, 2024Updated last year
talhanai / wer-sigtest
View on GitHub
Script to perform statistical significance test between ASR hypotheses.
☆23Aug 13, 2017Updated 8 years ago
Mddct / WeUSM
View on GitHub
☆13Mar 30, 2023Updated 3 years ago
WangHelin1997 / SSR-Speech
View on GitHub
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis
☆154Jan 1, 2025Updated last year
wonjune-kang / llm-speech-summarization
View on GitHub
Prompting Large Language Models with Audio for General-Purpose Speech Summarization
☆20May 14, 2025Updated last year
SpeechColab / GigaSpeech2
View on GitHub
An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement
☆197Apr 28, 2026Updated 2 months ago
bfs18 / e2_tts
View on GitHub
☆70Sep 3, 2024Updated last year
datemoon / ASR-decoder
View on GitHub
it's ASR decoder and make graph project
☆33May 26, 2022Updated 4 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
LingweiMeng / Whisper-Sidecar
View on GitHub
The implementation for "Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System".
☆34Aug 2, 2025Updated 11 months ago
light1726 / Speech-Tokenization-Papers
View on GitHub
This repository follows papers and reports on discrete speech representation learning and speech tokenization methods for speech language…
☆15Dec 1, 2023Updated 2 years ago
ZQuang2202 / Zipformer_Lightning
View on GitHub
An upgrade framework for train and validate compare with icefall using Lightning.
☆16Mar 26, 2025Updated last year
k2-fsa / fast_rnnt
View on GitHub
A torch implementation of a recursion which turns out to be useful for RNN-T.
☆149Aug 25, 2023Updated 2 years ago
Mddct / simple-tts
View on GitHub
（WIP）long form speech generatoins
☆30Apr 2, 2025Updated last year
jishengpeng / WavChat
View on GitHub
A Survey of Spoken Dialogue Models (60 pages)
☆316Nov 28, 2024Updated last year
SWivid / AUV
View on GitHub
An All-in-One Speech, Sound, Music Codec with Single Nested Codebook
☆28Oct 11, 2025Updated 9 months ago
umbertocappellazzo / Llama-AVSR
View on GitHub
Official Pytorch implementation of "Large Language Models are Strong Audio-Visual Speech Recognition Learners" [ICASSP 2025] and "Mitigat…
☆64Jan 18, 2026Updated 6 months ago
sarulab-speech / whisper-asr-finetune
View on GitHub
☆32Dec 4, 2022Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Mddct / cosyvoice2-flow-optimized
View on GitHub
faster inference
☆27Jan 20, 2025Updated last year
tzyll / ChineseHP
View on GitHub
Dataset for Pinyin Regularization in Error Correction for Chinese Speech Recognition with Large Language Models in Interspeech 2024.
☆16Jul 4, 2024Updated 2 years ago
nvidia-riva / riva-asrlib-decoder
View on GitHub
Standalone implementation of the CUDA-accelerated WFST Decoder available in Riva
☆91Feb 18, 2025Updated last year
HeCheng0625 / Diffusion-Speech-Tokenizer
View on GitHub
This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaD…
☆198Jan 25, 2026Updated 5 months ago
KdaiP / DC-Speech-VAE
View on GitHub
5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs
☆57Nov 19, 2025Updated 8 months ago
AudioLLMs / AudioBench
View on GitHub
AudioBench: A Universal Benchmark for Audio Large Language Models
☆319May 29, 2026Updated last month
archiki / ASR-Accent-Analysis
View on GitHub
Analysis and investigating the confounding effect of accents in end-to-end Automatic Speech Recognition models.
☆15Jun 27, 2020Updated 6 years ago
BUTSpeechFIT / mt-asr-data-prep
View on GitHub
☆25Feb 26, 2026Updated 4 months ago
gpu-poor / gramvaani_hindi_asr
View on GitHub
This repo contains the baseline model recipes and pre-trained model for GramVanni hindi ASR challenge
☆16Mar 26, 2022Updated 4 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
navana-tech / baseline_recipe_is21s_indic_asr_challenge
View on GitHub
Multilingual and code-switching ASR challenges for low resource Indian languages.
☆23Jul 26, 2021Updated 4 years ago
ga642381 / Speech-Prompts-Adapters
View on GitHub
This Repository surveys the paper focusing on Prompting and Adapters for Speech Processing.
☆113Aug 4, 2023Updated 2 years ago
k2-fsa / text_search
View on GitHub
Some fast-ish algorithms for batch text search in moderate-sized collections, intended for data cleanup
☆79Jun 30, 2025Updated last year
usnistgov / F4DE
View on GitHub
Framework for Detection Evaluation (F4DE) : set of evaluation tools for detection evaluations and for specific NIST-coordinated evaluatio…
☆26Jul 6, 2017Updated 9 years ago
xingchensong / TouchNet
View on GitHub
A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp.
☆232Jul 2, 2026Updated 2 weeks ago
bytedance / SALMONN
View on GitHub
SALMONN family: A suite of advanced multi-modal LLMs
☆1,476Updated this week
BUTSpeechFIT / DiaPer
View on GitHub
☆69Feb 8, 2024Updated 2 years ago