jakariaemon / WSILinks
Whisper Speaker Identification (WSI), a cutting-edge model for multilingual speaker identification.
☆23Updated 5 months ago
Alternatives and similar repositories for WSI
Users that are interested in WSI are comparing it to the libraries listed below
Sorting:
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆67Updated last month
- Open TTS models, built for streaming on the edge☆43Updated 5 months ago
- This repository contains the code and data for the paper EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control by Haozhe Chen,…☆78Updated 11 months ago
- ☆62Updated last year
- A lightweight, efficient variation of the StyleTTS 2 text‐to‐speech model.☆37Updated 3 months ago
- ☆132Updated last week
- 🎙️ Automatically transcribe audio/video into high-quality, speaker-specific Text-To-Speech datasets ✨☆119Updated 3 weeks ago
- An unofficial PyTorch implementation of VALL-E☆88Updated last month
- VALL-E 2 reproduction☆129Updated last year
- This is an implementation for train hifigan part of XTTSv2 model using Coqui/TTS.☆85Updated 9 months ago
- Voxtral: Convert Mistral into a end2end SpeechLM. No information bottleneck, preserves prosody, learns interruptions from data. Unlike GP…☆30Updated 5 months ago
- ☆275Updated last month
- The official Implementation of PeriodWave and PeriodWave-Turbo☆206Updated 4 months ago
- Collection of Open Source Speech Data☆159Updated 9 months ago
- ☆262Updated last year
- ☆39Updated last year
- Implementation of Sesame's Conversational Speech Model for Hugging Face Transformers☆57Updated 3 months ago
- SlamKit is an open source tool kit for efficient training of SpeechLMs. It was used for "Slamming: Training a Speech Language Model on On…☆215Updated 3 months ago
- Official implementation of the TTS model Lina-Speech☆168Updated 7 months ago
- LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM☆271Updated 3 months ago
- High quality text-to-speech based on StyleTTS 2.☆60Updated last week
- Official code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"☆115Updated 3 months ago
- VoiceBox neural network implementation☆109Updated last year
- This is a repository that collects common audio noise reduction models, using Gradio to demonstrate the use of each model, which is very …☆42Updated 8 months ago
- ☆220Updated 3 months ago
- ☆289Updated last month
- [TAFFC 2025] The official implementation of EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vec…☆109Updated 4 months ago
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆100Updated 8 months ago
- A TTS model capable of generating ultra-realistic dialogue in one pass.☆119Updated last month
- This project is to train an RWKV LLM for TTS generation which compatible to other TTS engine(like fish/cosy/chattts).☆83Updated this week