pkadambi/Wav2TextGrid

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/pkadambi/Wav2TextGrid)

pkadambi / Wav2TextGrid

Speaker adaptive forced alignment (phonetic segmentation) using Wav2Vec2

☆23

Alternatives and similar repositories for Wav2TextGrid

Users that are interested in Wav2TextGrid are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

donghoney0416 / DeepASA
View on GitHub
Official page of "DeepASA: An Object-Oriented Multi-Purpose Network for Auditory Scene Analysis"
☆26Apr 15, 2026Updated 3 months ago
lmaxwell / McHuo
View on GitHub
A chinese singing voice dataset, professional male singer, 105 songs, 132 minutes
☆12Oct 19, 2023Updated 2 years ago
rpuggaardrode / praatpicture
View on GitHub
Make Praat Picture style plots of acoustic data
☆37Updated this week
line / WaveTrainerFit
View on GitHub
Official implementation of "Wave-Trainer-Fit: Neural Vocoder with Trainable Prior and Fixed-Point Iteration towards High-Quality Speech G…
☆16Feb 6, 2026Updated 5 months ago
facebookresearch / spidr-adapt
View on GitHub
This repository contains the checkpoints and training code for the few-shot adaptation speech models in the SpidR-Adapt paper.
☆23Dec 29, 2025Updated 6 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
HuPER29 / HuPER
View on GitHub
☆16Mar 19, 2026Updated 4 months ago
violet-liang / soundfield-reconstruction-np
View on GitHub
Sound field reconstruction using neural processes with dynamic kernels
☆16Mar 25, 2025Updated last year
colstone / SOFA_AI
View on GitHub
SOFA_AI: Singing-Oriented Forced Aligner for Automatic Inference
☆28May 28, 2024Updated 2 years ago
UtaUtaUtau / nnsvs-db-converter
View on GitHub
Python script to convert NNSVS DBs to Diffsinger without the NNSVS Python Library
☆34Jul 30, 2025Updated 11 months ago
KeiKinn / ParaCLAP
View on GitHub
Towards a general language-audio model for computational paralinguistic tasks
☆30Dec 14, 2024Updated last year
facebookresearch / SS2_HRTF
View on GitHub
SS2 HRTF Dataset - Reality Labs Research Audio
☆18May 22, 2026Updated 2 months ago
Berkeley-Speech-Group / DysfluentWFST
View on GitHub
DysfluentWFST
☆19Nov 13, 2025Updated 8 months ago
BUTSpeechFIT / SOT-DiCoW
View on GitHub
Multi-talker ASR based on DiCoW with Serialized Output Training
☆20Sep 18, 2025Updated 10 months ago
zengchang233 / CrossSinger
View on GitHub
The source code for the paper CrossSinger (asru2023)
☆18Oct 12, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Infinity-INF / fast-phasr
View on GitHub
Phonemes and durations labeling based on whisper small
☆11Jul 7, 2024Updated 2 years ago
sarulab-speech / SpatialCLAP
View on GitHub
☆19Oct 9, 2025Updated 9 months ago
HaskinsLabs / get_vot
View on GitHub
☆11May 14, 2017Updated 9 years ago
wilkinghoff / DSpAST
View on GitHub
Code for the paper "DSpAST: Disentangled Representations for Spatial Audio Reasoning with Large Language Models"
☆17Oct 23, 2025Updated 9 months ago
haoweilou / ParaStyleTTS
View on GitHub
This is the official code for ACM CIKM 2025 Paper: ParaStyleTTS: Toward Efficient and Robust Paralinguistic Style Control for Expressive …
☆59Dec 21, 2025Updated 7 months ago
roudimit / Omni-R1
View on GitHub
[ASRU 2025] Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?
☆47Nov 21, 2025Updated 8 months ago
gwx314 / STARS
View on GitHub
STARS: A Unified Framework for Singing Transcription, Alignment, and Refined Style Annotation
☆85Nov 11, 2025Updated 8 months ago
taishi-n / torchrir
View on GitHub
PyTorch-based room impulse response (RIR) simulation toolkit with dynamic scenes, GPU acceleration.
☆22Updated this week
audiolabs / anechoic-noise
View on GitHub
Generator for anechoic, non-stationary noise signals
☆12Aug 12, 2022Updated 3 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
KdaiP / DC-Speech-VAE
View on GitHub
5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs
☆57Nov 19, 2025Updated 8 months ago
lingjzhu / charsiu
View on GitHub
Charsiu: A neural phonetic aligner.
☆347Sep 19, 2022Updated 3 years ago
StarDawn-VirtualSinger / fast-phasr-next
View on GitHub
☆10Nov 12, 2024Updated last year
JaeBinCHA7 / DEMUCS-for-Speech-Enhancement
View on GitHub
We implemented the DEMUCS model for speech enhancement in the time-frequency domain, and additionally implemented HD-DEMUCS.
☆34Nov 8, 2023Updated 2 years ago
yongaifadian1 / MNV-17
View on GitHub
Qwen2.5-Omni fine-tuned on MNV-17 dataset for nonverbal vocalization recognition
☆31Nov 13, 2025Updated 8 months ago
Jinbo-Hu / SELD-Data-Generator
View on GitHub
Data generator for sound event localization and detection clips, including 4-ch microphone-array-format signals and first-order-ambisonic…
☆22Nov 13, 2024Updated last year
facebookresearch / spidr
View on GitHub
This repository contains the training code from paper "SpidR Learning Fast and Stable Linguistic Units for Spoken Language Models Without…
☆57Updated this week
fluxions-ai / stftvae
View on GitHub
Inference for the STFT-VAE continuous audio codec (24kHz, 3.125Hz latent)
☆43Jul 12, 2026Updated last week
pwdonh / audio_tokens
View on GitHub
This is a Javascript toolbox to perform online rating studies with auditory material.
☆18Nov 18, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
SmartSoundKAIST / 6DRIR-DL
View on GitHub
6 DoF Directional Room Impulse Response (RIR) with Dense Loudspeaker Grid
☆17Aug 31, 2023Updated 2 years ago
jeremychee4 / AffectSpeech
View on GitHub
AffectSpeech: A Large-Scale Emotional Speech Dataset with Fine-Grained Textual Descriptions for Speech Emotion Captioning and Synthesis
☆68Jun 12, 2026Updated last month
Audio-Reasoning-Challenge / Audio-Reasoning-Challenge-Baselines
View on GitHub
The baselines of ARC-Challenge-Interspeech2026
☆60Dec 1, 2025Updated 7 months ago
stefanocoretta / speakr
View on GitHub
speakr: A Wrapper for the Phonetic Software Praat
☆27Feb 28, 2026Updated 4 months ago
zhu-han / SpeechLLM
View on GitHub
LLM-based ASR recipe with Zipformer encoder and Qwen LLM
☆35Sep 25, 2025Updated 10 months ago
Takaaki-Saeki / DiscreteSpeechMetrics
View on GitHub
Reference-aware automatic speech evaluation toolkit
☆185Dec 5, 2024Updated last year
adelacvg / DPTTS
View on GitHub
An AR+AR TTS attempt.
☆18Jan 13, 2025Updated last year