Madhuvod / VoxLinguaLinks
A Model (maybe an app) that translates the audio of a video from one language to another language, cloning the voice of original video with the translated audio
☆15Updated 6 months ago
Alternatives and similar repositories for VoxLingua
Users that are interested in VoxLingua are comparing it to the libraries listed below
Sorting:
- CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval☆13Updated 4 months ago
- LoRA-based phoneme/prosody control for LLM-based TTS with no G2P - Lightweight adapter for edit and control the target language's phoneme…☆17Updated 3 months ago
- Llama-Mimi is a speech language model that uses a unified tokenizer (Mimi) and a single Transformer decoder (Llama) to jointly model sequ…☆26Updated 2 months ago
- Code associated with the paper: CTC-DRO: Robust Optimization for Reducing Language Disparities in Speech Recognition.☆15Updated 6 months ago
- A composition of offline tools to achieve high quality multilingual speech to text transcription☆23Updated 2 months ago
- Text-to-Speech Latency Benchmark☆20Updated 4 months ago
- This repository includes training, inference, evaluation, and utility scripts developed for fine-tuning the Whisper medium.en model on Ai…☆18Updated last year
- Supervoice diffusion enhance☆27Updated last year
- Official implementation of the paper "Distilling a Pretrained Language Model to a Multilingual ASR Model" (Interspeech 2022)☆12Updated last year
- Arabic Grapheme-to-Phoneme (G2P) Conversion☆12Updated 8 months ago
- Code and Resources for "LLM-Powered Grapheme-to-Phoneme Conversion: Benchmark and Case Study", introducing methods to leverage LLMs for G…☆13Updated 6 months ago
- Vocoder-Free Non-Parallel Conversion of Whispered Speech With Masked Cycle-Consistent Generative Adversarial Networks☆17Updated 2 years ago
- Forced alignment decoder for Whisper.☆14Updated last year
- ☆17Updated 8 months ago
- a Neural Vocoder supporting Ring Attention, Conformer and NSF.☆22Updated 3 months ago
- Transfer learning approach to pronunciation scoring☆11Updated last year
- Zero-shot voice cloning text-to-speech (TTS) with explicit emotion class conditioning built on F5-TTS☆20Updated last week
- Sing any popular song with your voice☆11Updated 3 years ago
- KABooks is a tool to automate the process of creating datasets for training Text-To-Speech (TTS) and Speech-To-Text (STT) models. Using a…☆12Updated 2 years ago
- Soniox Compare. Compare real-time voice AI side by side. No glossy charts, just results.☆14Updated 4 months ago
- ☆20Updated 2 months ago
- Onset-and-Offset-Aware Sound Event Detection☆20Updated 9 months ago
- specifications and documentation for the Open Voice Interoperability Initiative Project☆20Updated last week
- PyTorch Implementation of [WMCodec: End-to-End Neural Speech Codec with Deep Watermarking for Authenticity Verification](https://arxiv.or…☆16Updated 3 months ago
- The Vokan Architecture (Tsukasa speech based)☆10Updated 9 months ago
- C++ version of pyannote audio overlapped speech detection pipeline☆13Updated last year
- (WIP) A retrain of F5-TTS on permissively-licensed data☆13Updated 7 months ago
- Multivoice: Enhance your foreign-language movie and TV show experience with personalized dubbed versions. Our project uses voice cloning …☆26Updated 2 years ago
- [EMNLP 2025 Findings] Official code for EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion☆28Updated 2 months ago
- Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"☆11Updated 7 months ago