Madhuvod / VoxLinguaLinks
A Model (maybe an app) that translates the audio of a video from one language to another language, cloning the voice of original video with the translated audio
β15Updated 6 months ago
Alternatives and similar repositories for VoxLingua
Users that are interested in VoxLingua are comparing it to the libraries listed below
Sorting:
- SpeechPlus: Small LLM-Based Text-to-Speech Library πβ16Updated 6 months ago
- CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrievalβ13Updated 5 months ago
- A composition of offline tools to achieve high quality multilingual speech to text transcriptionβ23Updated last week
- Code associated with the paper: CTC-DRO: Robust Optimization for Reducing Language Disparities in Speech Recognition.β15Updated 6 months ago
- Arabic Grapheme-to-Phoneme (G2P) Conversionβ13Updated 9 months ago
- This repository includes training, inference, evaluation, and utility scripts developed for fine-tuning the Whisper medium.en model on Aiβ¦β19Updated last year
- LoRA-based phoneme/prosody control for LLM-based TTS with no G2P - Lightweight adapter for edit and control the target language's phonemeβ¦β21Updated 4 months ago
- β29Updated last month
- Forced alignment decoder for Whisper.β14Updated last year
- Supervoice diffusion enhanceβ27Updated last year
- C++ version of pyannote audio overlapped speech detection pipelineβ13Updated last year
- Sing any popular song with your voiceβ11Updated 3 years ago
- β17Updated 8 months ago
- Text-to-Speech Latency Benchmarkβ21Updated 5 months ago
- chatterbox TTS + Voice Clone using onnxβ26Updated last month
- LIGHTVOC AN UPSAMPLING-FREE GAN VOCODER BASED ON CONFORMER AND INVERSE SHORT-TIME FOURIER TRANSFORMβ18Updated last year
- Zero-shot voice cloning text-to-speech (TTS) with explicit emotion class conditioning built on F5-TTSβ25Updated last week
- Transfer learning approach to pronunciation scoringβ11Updated last year
- Vocoder-Free Non-Parallel Conversion of Whispered Speech With Masked Cycle-Consistent Generative Adversarial Networksβ17Updated 2 years ago
- β11Updated last year
- specifications and documentation for the Open Voice Interoperability Initiative Projectβ21Updated this week
- Official implementation of the paper "Distilling a Pretrained Language Model to a Multilingual ASR Model" (Interspeech 2022)β12Updated last year
- Onset-and-Offset-Aware Sound Event Detectionβ20Updated 10 months ago
- This repository contains all the code necessary for running the multilingual distilwhisper from Ferraz et al. 2024 IEEE ICASSP paper.β32Updated last month
- β14Updated last year
- β23Updated last week
- β13Updated 4 years ago
- Getting confidences from any end-to-end systemsβ11Updated 2 years ago
- a Neural Vocoder supporting Ring Attention, Conformer and NSF.β23Updated 4 months ago
- KABooks is a tool to automate the process of creating datasets for training Text-To-Speech (TTS) and Speech-To-Text (STT) models. Using aβ¦β12Updated 2 years ago