halsay/ASR-TTS-paper-daily

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/halsay/ASR-TTS-paper-daily)

halsay / ASR-TTS-paper-daily

Update ASR paper everyday

☆513

Alternatives and similar repositories for ASR-TTS-paper-daily

Users that are interested in ASR-TTS-paper-daily are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

liutaocode / TTS-arxiv-daily
View on GitHub
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
☆663Updated this week
LqNoob / Neural-Codec-and-Speech-Language-Models
View on GitHub
Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models
☆246Jul 9, 2026Updated 3 weeks ago
MatthewCYM / VoiceBench
View on GitHub
[TACL'26] VoiceBench: Benchmarking LLM-Based Voice Assistants
☆378Jun 11, 2026Updated last month
Soul-AILab / SAC
View on GitHub
[ACL 2026 Main] Training, inference, and testing of the SAC speech codec model.
☆108Nov 1, 2025Updated 8 months ago
HeCheng0625 / Diffusion-Speech-Tokenizer
View on GitHub
This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaD…
☆198Jan 25, 2026Updated 6 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
xingchensong / S3Tokenizer
View on GitHub
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
☆521Dec 22, 2025Updated 7 months ago
ZhikangNiu / Semantic-VAE
View on GitHub
[INTERSPEECH 2026 Oral]Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"
☆121Jun 21, 2026Updated last month
ga642381 / speech-trident
View on GitHub
Awesome speech/audio LLMs, representation learning, and codec models
☆1,240Jul 10, 2026Updated 2 weeks ago
k2-fsa / ZipVoice
View on GitHub
Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
☆1,021Dec 2, 2025Updated 7 months ago
YangXusheng-yxs / CodecFormer_5Hz
View on GitHub
☆35Oct 23, 2025Updated 9 months ago
sarulab-speech / UTMOSv2
View on GitHub
UTokyo-SaruLab MOS Prediction System
☆357Apr 2, 2026Updated 3 months ago
inclusionAI / Ming-UniAudio
View on GitHub
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
☆451Nov 27, 2025Updated 8 months ago
xingchensong / FlashCosyVoice
View on GitHub
FlashCosyVoice: A lightweight vLLM implementation built from scratch for CosyVoice.
☆250Feb 25, 2026Updated 5 months ago
X-LANCE / SLAM-LLM
View on GitHub
A Framework for Speech, Language, Audio, Music Processing with Large Language Model
☆1,050Jan 15, 2026Updated 6 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
wenet-e2e / west
View on GitHub
We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction
☆206Jul 17, 2026Updated last week
BinWang28 / audio-ai-hub
View on GitHub
The hub for audio AI research: papers, open models, benchmarks & datasets across audio LLMs, speech recognition, TTS, music & audio gener…
☆950Updated this week
KdaiP / DC-Speech-VAE
View on GitHub
5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs
☆57Nov 19, 2025Updated 8 months ago
xiquan-li / MeanAudio
View on GitHub
[ACL 2026 Main] MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows
☆145Sep 2, 2025Updated 10 months ago
ddlBoJack / Awesome-Speech-Language-Model
View on GitHub
Paper, Code and Resources for Speech Language Model and End2End Speech Dialogue System.
☆202Jun 7, 2026Updated last month
rishikksh20 / MiniMax-TTS-pytorch
View on GitHub
Try to replicate the architecture of MiniMaxTTS mentioned in it's technical report
☆47Sep 2, 2025Updated 10 months ago
imxtx / awesome-controllable-speech-synthesis
View on GitHub
This is an evolving repo for the paper "Towards Controllable Speech Synthesis in the Era of Large Language Models: A Systematic Survey".
☆276Jul 21, 2026Updated last week
yangdongchao / RSTnet
View on GitHub
Real-time Speech-Text Foundation Model Toolkit (wip)
☆255Mar 26, 2025Updated last year
Ereboas / MagiCodec
View on GitHub
A single-layer, streaming codec model providing SOTA audio quality and discrete tokens designed for superior downstream modelability.
☆125Jun 4, 2025Updated last year
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
LAION-AI / emotion-annotations
View on GitHub
☆110Jul 15, 2026Updated 2 weeks ago
nonverbalspeech38k / nonverspeech38k
View on GitHub
The official repository for the paper “NonVerbalSpeech-38K: A Scalable Pipeline for Enabling Non-Verbal Speech Generation and Understandi…
☆68Dec 26, 2025Updated 7 months ago
kyutai-labs / nanoGPTaudio
View on GitHub
Code for the blog "Neural audio codecs: how to get audio into LLMs"
☆174Oct 20, 2025Updated 9 months ago
xingchensong / TouchNet
View on GitHub
A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp.
☆233Jul 2, 2026Updated 3 weeks ago
FrontierLabs / F5R-TTS
View on GitHub
Official code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"
☆169Mar 3, 2026Updated 4 months ago
yl4579 / DMOSpeech2
View on GitHub
☆302Jul 22, 2025Updated last year
ASLP-lab / MeanVC
View on GitHub
A Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows
☆298Jan 8, 2026Updated 6 months ago
Diamondfan / Child-ASR-Paper
View on GitHub
A list of papers for child ASR
☆54Oct 8, 2024Updated last year
k2-fsa / Flow2GAN
View on GitHub
Hybrid Flow Matching and GAN with Multi-Resolution Network for Few-Step High-Fidelity Audio Generation
☆146Mar 8, 2026Updated 4 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
lmxue / Audio-FLAN
View on GitHub
Audio-FLAN
☆161Sep 23, 2025Updated 10 months ago
k2-fsa / icefall
View on GitHub
☆1,465Jul 16, 2026Updated last week
Choddeok / EmoSphere-TTS
View on GitHub
[INTERSPEECH 2024] The official implementation of EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for …
☆182Jul 16, 2026Updated last week
Aria-K-Alethia / BigCodec
View on GitHub
Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"
☆218Sep 19, 2024Updated last year
FireRedTeam / FireRedVAD
View on GitHub
A SOTA Industrial-Grade Voice Activity Detection & Audio Event Detection, supporting 100+ languages, outperforming Silero-VAD, TEN-VAD, F…
☆472May 6, 2026Updated 2 months ago
X-LANCE / VoiceFlow-TTS
View on GitHub
[ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"
☆376Sep 3, 2024Updated last year
haoheliu / SemantiCodec-inference
View on GitHub
Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.
☆255Mar 7, 2025Updated last year