retkowsky / audio_embeddingsLinks
Audio search using Azure Cognitive Search
☆23Updated last year
Alternatives and similar repositories for audio_embeddings
Users that are interested in audio_embeddings are comparing it to the libraries listed below
Sorting:
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆62Updated 3 weeks ago
- Speaker Diarization with Transformers☆68Updated 2 weeks ago
- This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.☆27Updated last year
- ☆85Updated last year
- This is a fork of the original fairseq repository (version 0.12.2) with added classes for training mHuBERT-147.☆17Updated 7 months ago
- Repository for fine-tuning Transformers 🤗 based seq2seq speech models in JAX/Flax.☆36Updated 2 years ago
- ☆15Updated 3 months ago
- Joint speech-language model - respond directly to audio!☆30Updated last year
- Open TTS models, built for streaming on the edge☆43Updated 3 months ago
- ☆10Updated last year
- Speaker diarization service☆23Updated 2 months ago
- ☆62Updated 11 months ago
- Audio tokenization, in the fastest way possible!☆52Updated 10 months ago
- ☆104Updated 3 weeks ago
- Repository contains code to fine-tune WhisperASR model☆23Updated 2 years ago
- Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation☆146Updated last year
- SpeechGLUE is a speech version of the GLUE benchmark, driven by text-to-speech.☆13Updated 2 years ago
- A lightweight Python library for running TTS models with a unified API.☆20Updated 4 months ago
- A minimalistic automatic speech recognition streamlit based webapp powered by OpenAI's Whisper "State of the Art" models☆66Updated 2 years ago
- 🚀 Framework for seamless fine-tuning of Whisper model on a multi-lingual dataset and deployment to prod.☆27Updated 4 months ago
- Final training script from HuggingFace Whisper Fine tuning event - to get best results on finetuned model.☆12Updated 2 years ago
- TurnGPT: a Transformer-based Language Model for Predicting Turn-taking in Spoken Dialog☆55Updated last year
- Speaker change detection using SincNet and an LSTM/Transformer☆52Updated last month
- Create an LJSpeech structured voice dataset on wave input☆30Updated 8 months ago
- ☆46Updated 2 years ago
- A python package for whisper normalizer☆62Updated last week
- Identifying individual speakers in an audio stream based on the unique characteristics found in individual voices using Python☆18Updated 2 years ago
- Audio processing using deep neural networks. Speaker identification using voice embeddings.☆13Updated 2 years ago
- asr2k☆50Updated last year
- Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.☆84Updated last year