jakariaemon / WSI
Whisper Speaker Identification (WSI), a cutting-edge model for multilingual speaker identification.
☆17Updated last month
Alternatives and similar repositories for WSI:
Users that are interested in WSI are comparing it to the libraries listed below
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆62Updated last month
- Open TTS models, built for streaming on the edge☆41Updated last month
- ☆31Updated last month
- TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching☆54Updated 2 weeks ago
- A lightweight, efficient variation of the StyleTTS 2 text‐to‐speech model.☆15Updated this week
- High quality text-to-speech based on StyleTTS 2.☆39Updated this week
- Codebase and project page for EDMSound☆34Updated last year
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆87Updated 4 months ago
- ☆35Updated last year
- a Neural Vocoder supporting Ring Attention, Conformer and NSF.☆18Updated 2 months ago
- Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"☆38Updated 3 weeks ago
- StyleTTS 2 Optimized Training Fork☆28Updated 3 months ago
- ☆50Updated last month
- This repository contains the code and data for the paper EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control by Haozhe Chen,…☆71Updated 7 months ago
- [InterSpeech'2024] FluentEditor:Text-based Speech Editing by Considering Acoustic and Prosody Consistency☆52Updated 6 months ago
- An official implementation of Style-Talker for Spoken Dialogue Generation☆17Updated 3 months ago
- ☆62Updated 9 months ago
- 🌼 Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decomposition☆15Updated last year
- Audio tokenization, in the fastest way possible!☆52Updated 8 months ago
- Official Code for ParrotTTS☆50Updated 6 months ago
- a Frontier Japanese Speech Generation net☆34Updated last month
- GPT for FACodec☆13Updated last year
- Incremental Disentanglement for Environment-Aware Zero-Shot Text-to-Speech Synthesis☆24Updated last month
- ☆26Updated 6 months ago
- Official implementation for FlowSep☆45Updated 4 months ago
- This project is to train an RWKV LLM for TTS generation which compatible to other TTS engine(like fish/cosy/chattts).☆73Updated 2 weeks ago
- Zero-Shot Emotion Style Transfer☆45Updated 2 weeks ago
- Code associated with the paper: CTC-DRO: Robust Optimization for Reducing Language Disparities in Speech Recognition.☆14Updated 2 months ago
- This is a fork of the original fairseq repository (version 0.12.2) with added classes for training mHuBERT-147.☆17Updated 5 months ago
- ☆24Updated this week