jjihwan / Voice-CloningLinks
Simple, Unified Repository for Retrieval-based Voice Conversion
☆17Updated last year
Alternatives and similar repositories for Voice-Cloning
Users that are interested in Voice-Cloning are comparing it to the libraries listed below
Sorting:
- Run Retrieval-based Voice Conversion training and inference with ease.☆11Updated 7 months ago
- A curated list of resources in audio visual question answering and related area. :-)☆12Updated 2 months ago
- ☆10Updated last year
- ☆10Updated last year
- Multilingual-Speech-Synthesis-Voice-Conversion Using Bark + RVC☆14Updated 5 months ago
- Enabling the use of multiple modalities while prompting Stable Diffusion☆15Updated 2 years ago
- Diffusion Model for Voice Conversion☆17Updated 2 years ago
- The Land-Diffuser is a novel application of the Denoising Diffusion Probabilistic Model (DDPM) in the realm of 3D Talking Head generation…☆13Updated last year
- This is not remotely close to a finished product, and does not intend to nor does this claim to be working fine-tuning code for MaskGCT. …☆12Updated 9 months ago
- ☆12Updated last year
- ☆14Updated 2 years ago
- [TOMM 2024] Automatic Lyric Transcription and Automatic Music Transcription from Multimodal Singing☆25Updated last year
- ☆38Updated 2 months ago
- Music production for silent film clips.☆28Updated 4 months ago
- This is a winter of code project aimed at speech enhancement of text to speech models.☆24Updated 3 years ago
- EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs☆28Updated last week
- RVC Onnx Infer- Upgraded and simplified-ish☆23Updated last year
- SANE-TTS: Stable And Natural End-to-End Multilingual Text-to-Speech☆11Updated 2 years ago
- ☆26Updated 8 months ago
- AudioBERT 📢 : Audio Knowledge Augmented Language Model (ICASSP 2025)☆41Updated 7 months ago
- ☆14Updated last year
- [NCMMSC'2024] Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech☆22Updated last year
- A collection of all our phonemeizers for dataset construction and inference☆26Updated 7 months ago
- ☆15Updated last year
- Towards Fine-grained Audio Captioning with Multimodal Contextual Cues☆80Updated 3 months ago
- Apply an end-to-end model structure (ViT + GPT) to describe images in more detail, rather than traditional image captioning that only pro…☆11Updated 8 months ago
- ☆20Updated last year
- Official Implementation (Pytorch) of "Constant Acceleration Flow", NeurIPS 2024☆33Updated 7 months ago
- Code for paper "Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition"☆26Updated 2 years ago
- Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax☆15Updated last year