Forced alignment decoder for Whisper.
☆15Mar 13, 2024Updated 2 years ago
Alternatives and similar repositories for Whisper-Alignment
Users that are interested in Whisper-Alignment are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Train no-reference speech quality estimators with multiple datasets via learned, per-dataset alignments.☆18Aug 1, 2025Updated 8 months ago
- DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors☆38Feb 11, 2025Updated last year
- (WIP)long form speech generatoins☆31Apr 2, 2025Updated last year
- Audio-JEPA is an adaptation of the Joint-Embedding Predictive Architecture (JEPA) for self-supervised audio representation learning. Buil…☆49Mar 19, 2026Updated 3 weeks ago
- Collection of scripts from mHuBERT-147.☆34Nov 19, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆36Sep 6, 2025Updated 7 months ago
- Official PyTorch implementation of (ICME2025 oral) "AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-…☆24Feb 1, 2026Updated 2 months ago
- Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"☆11Apr 10, 2025Updated last year
- Text-to-Speech Benchmark☆23Apr 2, 2026Updated 2 weeks ago
- CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval☆13Jun 27, 2025Updated 9 months ago
- Repo of the paper "Towards Building an End-to-End Multilingual Automatic Lyrics Transcription Model""☆15Jun 28, 2024Updated last year
- ☆12Mar 11, 2025Updated last year
- Speaker-aware CTC (SACTC) for multi-talker overlapped speech recognition.☆22May 26, 2025Updated 10 months ago
- A neural speech codec based on discrete WavLM representations☆26Aug 28, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- LibriSpeech-Long is a benchmark dataset for long-form speech generation and processing. Released as part of "Long-Form Speech Generation …☆93Dec 28, 2024Updated last year
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆115Jan 28, 2026Updated 2 months ago
- source code of EfficientTTS 2☆20Feb 18, 2024Updated 2 years ago
- faster inference☆28Jan 20, 2025Updated last year
- This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Models☆35Oct 13, 2024Updated last year
- Unofficial implementation of ConvNeXt-TTS powered by lightning☆18Oct 20, 2024Updated last year
- ☆15Aug 22, 2025Updated 7 months ago
- a Neural Vocoder supporting Ring Attention, Conformer and NSF.☆25Aug 1, 2025Updated 8 months ago
- Text-To-Speech for NotebookLM☆39Jul 20, 2025Updated 8 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Estimating musical surprisal/information content in Audio☆26Updated this week
- Source code for the EMNLP 2025 paper “DM-Codec: Distilling Multimodal Representations for Speech Tokenization”☆57Jun 1, 2025Updated 10 months ago
- A toolkit to calculate speech audio quality. Not affiliated with the original authors☆72Aug 13, 2024Updated last year
- [ACL 2026 Main] MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows☆131Sep 2, 2025Updated 7 months ago
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆62Nov 1, 2024Updated last year
- CTC decoder with hotwords for ASR.☆35Apr 13, 2025Updated last year
- StyleTTS 2 Optimized Training Fork☆33Feb 2, 2025Updated last year
- Code for vec2wav 2.0, a speech token vocoder for VC. Paper: https://arxiv.org/abs/2409.01995☆79Dec 3, 2024Updated last year
- ☆25Mar 6, 2024Updated 2 years ago
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling☆97Nov 9, 2024Updated last year
- ArtSpeech: Adaptive Text-to-Speech Synthesis with Articulatory Representations☆21Sep 21, 2025Updated 6 months ago
- [ICASSP 2025] FreeSVC: Towards Zero-shot Multilingual Singing Voice Conversion☆91Jul 23, 2025Updated 8 months ago
- Acoustic echo cancelation(AEC) is a main algorithm in the pipe line of acoustic devices with KWS or ASR. FNLMS is used.☆19Apr 22, 2019Updated 6 years ago
- [ACL 2025] OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching☆45Feb 9, 2025Updated last year
- TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loud…☆111Dec 20, 2024Updated last year
- GPT-style network for phonemization with durations of text☆68Mar 21, 2024Updated 2 years ago