unoti / voice-embeddings
Audio processing using deep neural networks. Speaker identification using voice embeddings.
β13Updated 2 years ago
Alternatives and similar repositories for voice-embeddings:
Users that are interested in voice-embeddings are comparing it to the libraries listed below
- Prabhupadavani: A Code-mixed Speech Translation Data for 25 languagesβ13Updated 2 years ago
- πΉ pyannote + π notebook = pyannotebookβ26Updated last year
- Similarity Learning applied to Speaker Verification and Semantic Textual Similarityβ12Updated 4 years ago
- A π₯ cookiecutter template for building Hugging Face Spacesβ11Updated 3 years ago
- SpeechGLUE is a speech version of the GLUE benchmark, driven by text-to-speech.β13Updated last year
- Zero-Shot Foreign Accent Conversion without a Native Referenceβ30Updated 10 months ago
- Identifying individual speakers in an audio stream based on the unique characteristics found in individual voices using Pythonβ18Updated last year
- Emotion detection in audio utilising self-supervised representations trained with Contrastive Predictive Coding (CPC).β42Updated 3 years ago
- This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.β27Updated last year
- Official PyTorch implementation of TTS Style Transferβ23Updated 2 years ago
- This repository contains the implementation of the paper: "Span Classification with Structured Information for Disfluency Detection in Spβ¦β12Updated last year
- π₯ π€ The largest clinical study in the world to collect voice data labeled with health information (N>6,000 participants, 48 utterancesβ¦β28Updated 3 years ago
- Transcribing audio files using Hugging Face's implementation of Wav2Vec2 + "chain-linking" NLP tasks to combine speech-to-text with downsβ¦β31Updated 3 years ago
- Simple text to phonemes converter for multiple languagesβ20Updated 2 years ago
- audio, NLP, ML with huggingface, nvidia/nemo, speechbrainβ10Updated last year
- Implementation of "Audio xLSTMs: Learning Self-supervised audio representations with xLSTMs" in PyTorchβ18Updated last month
- Phoneme prediction from speech mel-spectrograms using RNN.β13Updated 5 years ago
- β15Updated 2 years ago
- This will hold the data pipeline to convert raw audio data to speech which will act as input dataset for speech-to-text pipelineβ32Updated 2 years ago
- Parallelized automatic corpus collection for ASR. Forked from https://github.com/EgorLakomkin/KTSpeechCrawlerβ23Updated 3 years ago
- Extract frequency, power, width and dissonance of formants from wav filesβ25Updated 2 years ago
- Feature extractor for DL speech processing.β65Updated 2 years ago
- REPeating Pattern Extraction Technique (REPET) in Python for audio source separation: original REPET, REPET extended, adaptive REPET, REPβ¦β32Updated last year
- A simple voice conversion toolβ17Updated 3 years ago
- Final training script from HuggingFace Whisper Fine tuning event - to get best results on finetuned model.β12Updated 2 years ago
- MaSS - Multilingual corpus of Sentence-aligned Spoken utterancesβ49Updated 5 months ago
- A mini, simple, and fast end-to-end automatic speech recognition toolkit.β50Updated 2 years ago
- Conditioned U-Net for Music Source Separationβ20Updated 3 years ago
- β12Updated 5 years ago
- Repository for fine-tuning Transformers π€ based seq2seq speech models in JAX/Flax.β35Updated 2 years ago