clement-pages / gryannoteLinks
Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.
β67Updated last month
Alternatives and similar repositories for gryannote
Users that are interested in gryannote are comparing it to the libraries listed below
Sorting:
- Open TTS models, built for streaming on the edgeβ43Updated 5 months ago
- ποΈ Automatically transcribe audio/video into high-quality, speaker-specific Text-To-Speech datasets β¨β119Updated 2 weeks ago
- β86Updated this week
- Collection of Open Source Speech Dataβ159Updated 9 months ago
- πΌ Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decompositionβ15Updated last year
- This repository contains the code and data for the paper EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control by Haozhe Chen,β¦β78Updated 10 months ago
- VoiceBox neural network implementationβ109Updated last year
- Speaker Diarization with Transformersβ69Updated 2 months ago
- This is a fork of the original fairseq repository (version 0.12.2) with added classes for training mHuBERT-147.β18Updated 9 months ago
- β62Updated last year
- Audio tokenization, in the fastest way possible!β52Updated last year
- [EMNLP Main '25] LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximationβ122Updated 3 months ago
- Whisper Speaker Identification (WSI), a cutting-edge model for multilingual speaker identification.β23Updated 5 months ago
- Google's SoundStorm: Efficient Parallel Audio Generationβ132Updated 2 years ago
- VoiceRestore: Flow-Matching Transformers for Universal Speech Restorationβ182Updated 4 months ago
- Official implementation of the TTS model Lina-Speechβ168Updated 7 months ago
- Automatically cleaning, enhancing, segmenting, filtering, and formatting a dataset to fine tune or train a voice model.β42Updated last month
- VALL-E 2 reproductionβ129Updated last year
- An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GPβ¦β102Updated 10 months ago
- SlamKit is an open source tool kit for efficient training of SpeechLMs. It was used for "Slamming: Training a Speech Language Model on Onβ¦β215Updated 3 months ago
- An unofficial PyTorch implementation of VALL-Eβ88Updated 3 weeks ago
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.β100Updated 8 months ago
- Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.β88Updated last year
- Open-source reproducible benchmarks from Argmaxβ53Updated this week
- [TAFFC 2025] The official implementation of EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vecβ¦β109Updated 4 months ago
- Official code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"β115Updated 2 months ago
- β262Updated last year
- This is an implementation for train hifigan part of XTTSv2 model using Coqui/TTS.β84Updated 9 months ago
- Add n-gram and large language model (LLM) support to Whisper models.β31Updated 3 months ago
- This is a repository that collects common audio noise reduction models, using Gradio to demonstrate the use of each model, which is very β¦β42Updated 8 months ago