clement-pages / gryannoteLinks
Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.
β69Updated last month
Alternatives and similar repositories for gryannote
Users that are interested in gryannote are comparing it to the libraries listed below
Sorting:
- Open TTS models, built for streaming on the edgeβ44Updated 8 months ago
- ποΈ Automatically transcribe audio/video into high-quality, speaker-specific Text-To-Speech datasets β¨β128Updated 4 months ago
- A package for NeuCodec: a 50hz, 0.8kbps, 24kHz audio codec.β127Updated 2 months ago
- Collection of Open Source Speech Dataβ162Updated 2 months ago
- Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.β91Updated 2 years ago
- An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GPβ¦β103Updated last year
- Official implementation of the TTS model Lina-Speechβ175Updated 11 months ago
- Speaker Diarization with Transformersβ69Updated 6 months ago
- VoXtream is a Full-Stream Zero-shot TTS model with Extremely Low Latencyβ174Updated last month
- Whisper Speaker Identification (WSI), a cutting-edge model for multilingual speaker identification.β25Updated 8 months ago
- β62Updated last year
- Google's SoundStorm: Efficient Parallel Audio Generationβ130Updated 2 years ago
- This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaDβ¦β194Updated 2 months ago
- This is a fork of the original fairseq repository (version 0.12.2) with added classes for training mHuBERT-147.β19Updated last year
- VoiceBox neural network implementationβ110Updated last year
- πΌ Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decompositionβ14Updated 3 weeks ago
- β261Updated last year
- VALL-E 2 reproductionβ133Updated last year
- Audio tokenization, in the fastest way possible!β53Updated last year
- This repository contains the code and data for the paper EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control by Haozhe Chen,β¦β80Updated last year
- A TTS model that makes a speaker speak new languagesβ76Updated last year
- An unofficial PyTorch implementation of VALL-Eβ88Updated 4 months ago
- β92Updated last month
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.β113Updated 11 months ago
- A TTS model capable of generating ultra-realistic dialogue in one pass.β127Updated 4 months ago
- Official code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"β133Updated 6 months ago
- β205Updated last month
- Automatically cleaning, enhancing, segmenting, filtering, and formatting a dataset to fine tune or train a voice model.β45Updated 2 months ago
- Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translationβ151Updated last year
- [EMNLP Main '25] LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximationβ138Updated 6 months ago