clement-pages / gryannoteLinks
Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.
☆62Updated last week
Alternatives and similar repositories for gryannote
Users that are interested in gryannote are comparing it to the libraries listed below
Sorting:
- Open TTS models, built for streaming on the edge☆43Updated 2 months ago
- Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.☆83Updated last year
- VoiceBox neural network implementation☆108Updated 10 months ago
- This is a fork of the original fairseq repository (version 0.12.2) with added classes for training mHuBERT-147.☆17Updated 6 months ago
- Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"☆38Updated this week
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆90Updated 5 months ago
- Add n-gram and large language model support to Whisper models.☆19Updated last month
- Whisper Speaker Identification (WSI), a cutting-edge model for multilingual speaker identification.☆19Updated 2 months ago
- ☆64Updated last month
- An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GP…☆98Updated 7 months ago
- This repository contains the code and data for the paper EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control by Haozhe Chen,…☆73Updated 8 months ago
- Official implementation for FlowSep☆50Updated 5 months ago
- Audio tokenization, in the fastest way possible!☆52Updated 9 months ago
- Companion repo for the paper "PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings…☆90Updated 4 months ago
- StyleTTS 2 Optimized Training Fork☆29Updated 4 months ago
- This is the audio sample repository for speech separation model "MossFormer2".☆129Updated 6 months ago
- ☆13Updated last month
- [TAFFC 2025] The official implementation of EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vec…☆93Updated last month
- ☆50Updated 2 months ago
- An unofficial PyTorch implementation of VALL-E☆87Updated this week
- ☆62Updated 10 months ago
- Code associated with the paper: CTC-DRO: Robust Optimization for Reducing Language Disparities in Speech Recognition.☆15Updated 3 weeks ago
- High quality text-to-speech based on StyleTTS 2.☆47Updated last week
- SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis☆135Updated 5 months ago
- Speaker change detection using SincNet and an LSTM/Transformer☆51Updated last week
- Automatically cleaning, enhancing, segmenting, filtering, and formatting a dataset to fine tune or train a voice model.☆36Updated last week
- ☆40Updated 3 months ago
- TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching☆58Updated last month
- Google's SoundStorm: Efficient Parallel Audio Generation☆132Updated last year
- StyleTTS2 + Vocos as a Decoder☆12Updated 2 months ago