clement-pages / gryannoteLinks
Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.
β68Updated last week
Alternatives and similar repositories for gryannote
Users that are interested in gryannote are comparing it to the libraries listed below
Sorting:
- Open TTS models, built for streaming on the edgeβ43Updated 7 months ago
- ποΈ Automatically transcribe audio/video into high-quality, speaker-specific Text-To-Speech datasets β¨β128Updated 2 months ago
- Collection of Open Source Speech Dataβ161Updated 3 weeks ago
- A package for NeuCodec: a 50hz, 0.8kbps, 24kHz audio codec.β104Updated 3 weeks ago
- Whisper Speaker Identification (WSI), a cutting-edge model for multilingual speaker identification.β24Updated 7 months ago
- VoiceBox neural network implementationβ110Updated last year
- β62Updated last year
- Automatically cleaning, enhancing, segmenting, filtering, and formatting a dataset to fine tune or train a voice model.β43Updated last month
- This repository contains the code and data for the paper EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control by Haozhe Chen,β¦β80Updated last year
- VoXtream is a Full-Stream Zero-shot TTS model with Extremely Low Latencyβ159Updated this week
- Speaker Diarization with Transformersβ69Updated 4 months ago
- Official implementation of the TTS model Lina-Speechβ170Updated 9 months ago
- Google's SoundStorm: Efficient Parallel Audio Generationβ132Updated 2 years ago
- Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.β90Updated 2 years ago
- πΌ Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decompositionβ15Updated last year
- Joint speech-language model - respond directly to audio!β30Updated last year
- Go from raw audio files to a text-audio dataset automatically with OpenAI's Whisper.β137Updated 2 years ago
- β262Updated last year
- This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaDβ¦β185Updated last month
- An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GPβ¦β102Updated last year
- LongCat Audio Tokenizer and Detokenizerβ178Updated last week
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.β104Updated 10 months ago
- This is an implementation for train hifigan part of XTTSv2 model using Coqui/TTS.β85Updated 11 months ago
- An unofficial PyTorch implementation of VALL-Eβ88Updated 2 months ago
- This is a fork of the original fairseq repository (version 0.12.2) with added classes for training mHuBERT-147.β19Updated 11 months ago
- VALL-E 2 reproductionβ131Updated last year
- Audio tokenization, in the fastest way possible!β53Updated last year
- A TTS model that makes a speaker speak new languagesβ76Updated last year
- Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translationβ151Updated last year
- Speaker change detection using SincNet and an LSTM/Transformerβ55Updated 5 months ago