jasonppy/syllable-discovery

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/jasonppy/syllable-discovery)

jasonppy / syllable-discovery

Syllable Segmentation and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model

☆35

Alternatives and similar repositories for syllable-discovery

Users that are interested in syllable-discovery are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

jasonppy / word-discovery
View on GitHub
Word Discovery in Visually Grounded, Self-Supervised Speech Models
☆27Dec 4, 2023Updated 2 years ago
cheoljun95 / sdhubert
View on GitHub
☆27Dec 4, 2024Updated last year
jasonppy / FaST-VGS-Family
View on GitHub
Transformer-based visually grounded speech models
☆19Sep 22, 2022Updated 3 years ago
nii-yamagishilab / speaker_sex_attribute_privacy
View on GitHub
Project for HIDING SPEAKER’S SEX IN SPEECH USING ZERO-EVIDENCE SPEAKER REPRESENTATION IN AN ANALYSIS/SYNTHESIS PIPELINE
☆15Nov 30, 2022Updated 3 years ago
jasonppy / PromptingWhisper
View on GitHub
Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation
☆151Jan 16, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
lstrgar / ss-phoneme-seg
View on GitHub
Code for "Phoneme Segmentation Using Self-Supervised Speech Models", Strgar & Harwath, Proceedings of the IEEE Spoken Language Technology…
☆55Nov 4, 2022Updated 3 years ago
mushanshanshan / ESLTTS
View on GitHub
ESLTTS dataset
☆16Feb 6, 2025Updated last year
CoEDL / vad-sli-asr
View on GitHub
A pipeline to isolate and transcribe one language in mixed-language speech
☆20Oct 25, 2022Updated 3 years ago
seahore / PPG-GradVC
View on GitHub
A diffusion-based cross-lingual voice conversion model, as my bachelor's thesis
☆45Jul 24, 2023Updated 3 years ago
changelinglab / PhoneticXeus
View on GitHub
A universal phone recognizer that can transcribe speech in 70+ languages into IPA
☆28Jun 9, 2026Updated last month
kamperh / vqwordseg
View on GitHub
Unsupervised phone and word segmentation using dynamic programming on self-supervised VQ features.
☆39May 5, 2026Updated 2 months ago
JSALT-2022-SSL / superb-prosody
View on GitHub
☆31Jul 13, 2023Updated 3 years ago
y-chan / hifi-gan-misrnet
View on GitHub
unofficial pytorch implementation of HiFi-GAN with fast MISR.
☆15Mar 21, 2023Updated 3 years ago
huutuongtu / Lightvoc
View on GitHub
LIGHTVOC AN UPSAMPLING-FREE GAN VOCODER BASED ON CONFORMER AND INVERSE SHORT-TIME FOURIER TRANSFORM
☆18May 17, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
jisang93 / VISinger
View on GitHub
Unofficial pytorch implementation of VISinger: Variational Inference with Adversarial Learning for End-to-end Singing Voice Synthesis (IC…
☆20May 12, 2023Updated 3 years ago
WangHelin1997 / Aty-TTS
View on GitHub
Aty-TTS: Improving fairness for spoken language understanding in atypical speech with Text-to-Speech
☆11May 14, 2025Updated last year
Lhx94As / E2E-language-diarization
View on GitHub
Source code of paper <End-to-End Language Diarization for Bilingual Code-switching Speech>
☆19Jan 23, 2022Updated 4 years ago
Auroraaa86 / LCS-CTC
View on GitHub
For IEEE ASRU(2025)
☆15Jun 21, 2025Updated last year
light1726 / SpeechTripleNet
View on GitHub
The implementation of paper "SpeechTripleNet: End-to-End Disentangled Speech Representation Learning for Content, Timbre and Prosody"
☆33Nov 23, 2023Updated 2 years ago
uthree / ddsp-vocoder
View on GitHub
☆12Nov 7, 2024Updated last year
ajaybati / miipher2.0
View on GitHub
Reimplementation of Miipher
☆30Aug 16, 2023Updated 2 years ago
YuanGongND / llm_speech_emotion_challenge
View on GitHub
☆23Jun 24, 2024Updated 2 years ago
ankitapasad / layerwise-analysis
View on GitHub
Layer-wise analysis of self-supervised pre-trained speech representations
☆135Oct 18, 2024Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
oatsu-gh / enunu_kodoku_singing
View on GitHub
22人で童謡を5曲ずつ歌ってつくった歌唱データベースです。
☆15Aug 7, 2022Updated 3 years ago
HarunoriKawano / BEST-RQ
View on GitHub
Implementation of the paper "Self-supervised Learning with Random-projection Quantizer for Speech Recognition" in Pytorch.
☆96May 25, 2023Updated 3 years ago
samsad35 / code-ancogen
View on GitHub
[ICASSP 2025] AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder
☆14Mar 11, 2025Updated last year
Infinity-INF / fast-phasr
View on GitHub
Phonemes and durations labeling based on whisper small
☆11Jul 7, 2024Updated 2 years ago
roudimit / c2kd
View on GitHub
Code for the C2KD paper (ICASSP 2023)
☆20May 15, 2023Updated 3 years ago
jhuang448 / MultilingualALT
View on GitHub
Repo of the paper "Towards Building an End-to-End Multilingual Automatic Lyrics Transcription Model""
☆15Jun 28, 2024Updated 2 years ago
thuhcsi / SnakeGAN
View on GitHub
Please visit https://thuhcsi.github.io/SnakeGAN/
☆37Apr 25, 2023Updated 3 years ago
lifeiteng / SoundStorm
View on GitHub
☆71Jul 13, 2023Updated 3 years ago
Speech-Lab-IITM / Hindi-ASR-Challenge
View on GitHub
🎯 Speech Recognition Challenge by Speech Lab - IIT Madras
☆10Nov 5, 2020Updated 5 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
TTS-Research / PEL-TTS
View on GitHub
☆14Aug 16, 2023Updated 2 years ago
ryanrudes / YTTTS
View on GitHub
The YouTube Text-To-Speech dataset is comprised of waveform audio extracted from YouTube videos alongside their English transcriptions
☆53Apr 1, 2021Updated 5 years ago
yuboona / some-script-to-help-using-Montreal-Forced-Aligner
View on GitHub
Some script for helping using Montreal Forced Aligner, maily for transforming Hanzi character to pinyin and extrat pause time from .textg…
☆14Feb 9, 2024Updated 2 years ago
bshall / urhythmic
View on GitHub
Unsupervised Rhythm Modeling for Voice Conversion
☆85Aug 3, 2023Updated 2 years ago
mubtasimahasan / DM-Codec
View on GitHub
Source code for the EMNLP 2025 paper “DM-Codec: Distilling Multimodal Representations for Speech Tokenization”
☆57Jun 1, 2025Updated last year
sarahjuan / iban
View on GitHub
☆14Jun 12, 2015Updated 11 years ago
chenqi008 / pymcd
View on GitHub
Package pymcd
☆40Sep 8, 2022Updated 3 years ago