alexkroman/tiny-audio

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/alexkroman/tiny-audio)

alexkroman / tiny-audio

Train your own speech AI model from scratch

☆152

Alternatives and similar repositories for tiny-audio

Users that are interested in tiny-audio are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

LilDevsy0117 / Ultra-Sortformer
View on GitHub
Ultra-Sortformer for Scalable Speaker Diarization
☆27Apr 9, 2026Updated 3 months ago
Zhongxu-Wang / ArtSpeech
View on GitHub
ArtSpeech: Adaptive Text-to-Speech Synthesis with Articulatory Representations
☆22Sep 21, 2025Updated 10 months ago
0xSojalSec / anylabeling
View on GitHub
Effortless AI-assisted data labeling with AI support from YOLO, Segment Anything (SAM+SAM2), MobileSAM!!
☆82May 4, 2025Updated last year
ductuantruong / speaker_age_estimation_ssl_study
View on GitHub
[APSIPA'22] Exploring Speaker Age Estimation on Different Self-Supervised Learning Models
☆14Oct 19, 2022Updated 3 years ago
ArenAcikgoz / Whisper-Alignment
View on GitHub
Forced alignment decoder for Whisper.
☆16Mar 13, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
pkufool / simple-wer
View on GitHub
A simple command line tool to calculate WER for ASR.
☆14Updated this week
Vyvo-Labs / CodecHub
View on GitHub
CodecHub: A Unified Library for Codec Models
☆25Dec 24, 2025Updated 7 months ago
ysharma3501 / LinaCodec
View on GitHub
A highly compressive and high-quality neural audio codec for speech models.
☆269Jan 23, 2026Updated 6 months ago
yxlu-0102 / IDEA-TTS
View on GitHub
Incremental Disentanglement for Environment-Aware Zero-Shot Text-to-Speech Synthesis
☆27Mar 21, 2025Updated last year
corticph / error-align
View on GitHub
Text-to-text alignment algorithm for speech recognition error analysis.
☆32Jun 23, 2026Updated last month
AkshathRaghav / tinyspeech
View on GitHub
Code release for "TinySpeech: Attention Condensers for Deep Speech Recognition Neural Networks on Edge Devices"
☆23Jun 7, 2025Updated last year
Tikai7 / DiTTO-TTS
View on GitHub
DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors
☆39Feb 11, 2025Updated last year
EndlessReform / smoltts
View on GitHub
Open TTS models, built for streaming on the edge
☆45Mar 16, 2025Updated last year
WhissleAI / PromptingNemo
View on GitHub
All-in-one Speech Transcription
☆11Jun 5, 2026Updated last month
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Vyvo-Labs / VyvoTTS
View on GitHub
VyvoTTS: LLM-Based Text-to-Speech Training Framework
☆257Apr 8, 2026Updated 3 months ago
Deep-unlearning / Finetune-Parakeet
View on GitHub
☆25Oct 22, 2025Updated 9 months ago
nineninesix-ai / gepard-train
View on GitHub
☆24Jul 12, 2026Updated 2 weeks ago
tarun360 / SpeakerProfiling
View on GitHub
Estimating the Age, Height, and Gender of a speaker with their speech signal.
☆15Sep 19, 2022Updated 3 years ago
pengzhendong / ngram-punctuator
View on GitHub
An N-gram punctuator for Chinese and English.
☆18Oct 14, 2025Updated 9 months ago
thu-spmi / CTC-TTS
View on GitHub
Code for CTC-TTS: LLM-based dual-streaming text-to-speech with CTC alignment, Interspeech 2026.
☆20Jun 9, 2026Updated last month
pengzhendong / torchfa
View on GitHub
Torch Audio Forced Aligner for Mixed Chinese (Mandarin or Cantonese) and English.
☆61Sep 5, 2025Updated 10 months ago
yuhangear / wenet-android
View on GitHub
☆13Oct 27, 2021Updated 4 years ago
audiodemo / voice-conversion
View on GitHub
Vocoder-Free Non-Parallel Conversion of Whispered Speech With Masked Cycle-Consistent Generative Adversarial Networks
☆17Aug 18, 2023Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
zhu-han / SpeechLLM
View on GitHub
LLM-based ASR recipe with Zipformer encoder and Qwen LLM
☆35Sep 25, 2025Updated 10 months ago
ORI-Muchim / Efficient-Speech
View on GitHub
Lightweight Korean TTS Model based on FastSpeech2
☆15Mar 4, 2026Updated 4 months ago
ydqmkkx / ShallowFlowMatching-TTS
View on GitHub
Official implementation of paper: Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis
☆55Sep 20, 2025Updated 10 months ago
hqsiswiliam / punctuation-restoration-scl
View on GitHub
Token-Level Supervised Contrastive Learning for Punctuation Restoration
☆29Sep 8, 2021Updated 4 years ago
llm-lab-org / CLASP
View on GitHub
CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval
☆13Jun 27, 2025Updated last year
primepake / learnable-speech
View on GitHub
This repo is text to speech with learnable audio encoder without alignment with transcript reference
☆54Sep 20, 2025Updated 10 months ago
nonverbalspeech38k / nonverspeech38k
View on GitHub
The official repository for the paper “NonVerbalSpeech-38K: A Scalable Pipeline for Enabling Non-Verbal Speech Generation and Understandi…
☆68Dec 26, 2025Updated 7 months ago
jonirajala / kokoro_training
View on GitHub
Training code for kokoro tts model
☆45Nov 15, 2025Updated 8 months ago
LEMAS-Project / LEMAS-TTS
View on GitHub
LEMAS‑TTS is a multilingual zero‑shot text‑to‑speech system, supporting 10 languages: Chinese English Spanish Russian French German Ital…
☆101Mar 31, 2026Updated 3 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
taeyoun811 / Whisfusion
View on GitHub
Whisfusion: Parallel ASR Decoding via a Diffusion Transformer
☆31Aug 22, 2025Updated 11 months ago
aiola-lab / drax
View on GitHub
Drax: Speech Recognition with Discrete Flow Matching
☆75Oct 15, 2025Updated 9 months ago
dywsy21 / STCTS
View on GitHub
A voice communication system that achieves ~80 bps while maintaining near-natural quality
☆48Dec 3, 2025Updated 7 months ago
abatsakidis / FaceRecognition
View on GitHub
Multi Face Recognition and Detection
☆69Nov 1, 2022Updated 3 years ago
RuABraun / texterrors
View on GitHub
☆37Jun 9, 2026Updated last month
kyutai-labs / nanoGPTaudio
View on GitHub
Code for the blog "Neural audio codecs: how to get audio into LLMs"
☆174Oct 20, 2025Updated 9 months ago
HeCheng0625 / Diffusion-Speech-Tokenizer
View on GitHub
This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaD…
☆198Jan 25, 2026Updated 6 months ago