karthikbhamidipati/multi-task-speech-classification

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/karthikbhamidipati/multi-task-speech-classification)

karthikbhamidipati / multi-task-speech-classification

Multi-Task Speech classification of accent and gender of an english speaker on Mozilla's common voice dataset

☆28

Alternatives and similar repositories for multi-task-speech-classification

Users that are interested in multi-task-speech-classification are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

lifeiteng / NotebookTTS
View on GitHub
Text-To-Speech for NotebookLM
☆39Jul 20, 2025Updated last year
walker-hyf / ECSS
View on GitHub
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling (Accepted by AAAI'2024)
☆59Jun 20, 2024Updated 2 years ago
glory20h / VoiceLDM
View on GitHub
VoiceLDM: Text-to-Speech with Environmental Context
☆194Aug 9, 2024Updated last year
titu1994 / warprnnt_numba
View on GitHub
WarpRNNT loss ported in Numba CPU/CUDA for Pytorch
☆17Mar 11, 2022Updated 4 years ago
GalaxyCong / StyleDubber
View on GitHub
[ACL 2024] This is the Pytorch code for our paper "StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing"
☆98Nov 14, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
scutcsq / Neural-Transducers-for-Two-Stage-Text-to-Speech-via-Semantic-Token-Prediction
View on GitHub
Unofficial pytorch reproduction for the paper "Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction" (…
☆60Apr 4, 2024Updated 2 years ago
ductuantruong / speaker_age_estimation_ssl_study
View on GitHub
[APSIPA'22] Exploring Speaker Age Estimation on Different Self-Supervised Learning Models
☆14Oct 19, 2022Updated 3 years ago
yangdongchao / ALMTokenizer2
View on GitHub
The open source code of ALMTokenizer2: Towards Low bit-rate and Semantic-rich Audio Tokenizer with Flow-based Scalar Diffusion Transforme…
☆45Sep 5, 2025Updated 10 months ago
jzmzhong / Automatic-Prosody-Annotator-with-SSWP-CLAP
View on GitHub
An automatic prosodic boundary annotation tool for Text-to-Speech Synthesis (TTS).
☆50Jun 11, 2024Updated 2 years ago
walker-hyf / NCSSD
View on GitHub
Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)
☆61Nov 1, 2024Updated last year
qiuqiangkong / sampleRNN_acoustic_scene_generation
View on GitHub
☆14Apr 18, 2019Updated 7 years ago
zengchang233 / xiaoicesing2
View on GitHub
The source code for the paper XiaoiceSing2 (interspeech2023)
☆49Jan 15, 2024Updated 2 years ago
kan-bayashi / Taco2withBERT
View on GitHub
Tacotron2 with BERT examples
☆10Jul 8, 2019Updated 7 years ago
line / LibriTTS-P
View on GitHub
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
☆161Jun 13, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
X-LANCE / StoryTTS
View on GitHub
[ICASSP 2024] StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations
☆141Apr 27, 2024Updated 2 years ago
WangHelin1997 / Automatic_Speech_Annotator
View on GitHub
Automatic speech annotator processing speech with voice activaty detection, overlapping speech detection, speaker diarization and automat…
☆33Jun 14, 2024Updated 2 years ago
Quint-e / equivariant-self-supervision-tempo
View on GitHub
Official implementation of "Equivariant Self-Supervision for Musical Tempo Estimation (ISMIR 2022)"
☆26Feb 6, 2023Updated 3 years ago
CODEJIN / NaturalSpeech2
View on GitHub
☆139Jan 7, 2024Updated 2 years ago
sunlicai / HiCMAE
View on GitHub
[Information Fusion 2024] HiCMAE: Hierarchical Contrastive Masked Autoencoder for Self-Supervised Audio-Visual Emotion Recognition
☆121Aug 29, 2025Updated 10 months ago
justinlovelace / SESD
View on GitHub
☆61Oct 28, 2024Updated last year
dhchoi99 / NANSY
View on GitHub
☆171Jul 25, 2022Updated 3 years ago
cnaigithub / SpeechDewarping
View on GitHub
Official implementation of "Unsupervised Pre-training for Data-Efficient Text-to-Speech on Low Resource Languages", ICASSP 2023
☆27Apr 27, 2023Updated 3 years ago
DabDans / AudioMarathon
View on GitHub
Code for "AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs"
☆26Oct 9, 2025Updated 9 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
jishengpeng / TextrolSpeech
View on GitHub
[ICASSP 2024] TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models
☆187Nov 22, 2024Updated last year
JeremyCCHsu / Python-Wrapper-for-World-Vocoder
View on GitHub
A Python wrapper for the high-quality vocoder "World"
☆790Jan 21, 2025Updated last year
CODEJIN / XiaoiceSing2
View on GitHub
☆19Feb 2, 2023Updated 3 years ago
nigelgward / midlevel
View on GitHub
Prosodic features for machine-learning applications, in Matlab.
☆15Oct 14, 2025Updated 9 months ago
ybayle / ISM2017
View on GitHub
Reproducible research code for the experiments presented in our article "Kara1k: a karaoke dataset for cover song identification and sing…
☆10Jan 9, 2018Updated 8 years ago
ajaybati / miipher2.0
View on GitHub
Reimplementation of Miipher
☆30Aug 16, 2023Updated 2 years ago
WangHelin1997 / LibriLightMix-WHAMR
View on GitHub
Python scripts to create noisy and reverberant 2-speaker mixture audio with Libri-Light and WHAM
☆17Nov 7, 2024Updated last year
tomasJwYU / AutoPrepDemo
View on GitHub
AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data
☆36Dec 31, 2023Updated 2 years ago
facebookresearch / daqa
View on GitHub
Temporal Reasoning via Audio Question Answering
☆27Dec 21, 2019Updated 6 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
reppy4620 / diffusion
View on GitHub
My implementation of diffusion (like) models
☆11Apr 14, 2023Updated 3 years ago
lavendery / AudioComposer
View on GitHub
☆27Sep 10, 2025Updated 10 months ago
Levent9 / Zero-shot-FaceVC
View on GitHub
☆19Mar 2, 2024Updated 2 years ago
AmeenAli / VideoMatch
View on GitHub
☆14Jan 5, 2022Updated 4 years ago
bagustris / ssl-ser
View on GitHub
Repository for reproducing result in journal "Self-supervised learning for Speech Emotion Recognition"
☆10Mar 15, 2023Updated 3 years ago
zhenye234 / CoMoSpeech
View on GitHub
ACM MM 2023 CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
☆214Apr 26, 2024Updated 2 years ago
rishikksh20 / NaturalSpeech2
View on GitHub
☆69May 19, 2023Updated 3 years ago