retkowsky / audio_embeddings
Audio search using Azure Cognitive Search
☆22Updated last year
Alternatives and similar repositories for audio_embeddings:
Users that are interested in audio_embeddings are comparing it to the libraries listed below
- Audio tokenization, in the fastest way possible!☆51Updated 8 months ago
- A lightweight Python library for running TTS models with a unified API.☆18Updated 2 months ago
- This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.☆27Updated last year
- Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor…☆57Updated last year
- Open TTS models, built for streaming on the edge☆39Updated last month
- ☆84Updated last year
- Repository contains code to fine-tune WhisperASR model☆23Updated 2 years ago
- ☆11Updated last month
- Joint speech-language model - respond directly to audio!☆30Updated 11 months ago
- ☆38Updated 3 years ago
- Using short models to classify long texts☆21Updated 2 years ago
- Whisper Speaker Identification (WSI), a cutting-edge model for multilingual speaker identification.☆14Updated last month
- A streaming whisper server for on-prem transcription☆20Updated 8 months ago
- Speaker Diarization with Transformers☆64Updated 11 months ago
- ☆88Updated 2 weeks ago
- Repository for fine-tuning Transformers 🤗 based seq2seq speech models in JAX/Flax.☆35Updated 2 years ago
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.☆34Updated 4 months ago
- Notebook and Scripts that showcase running quantized diffusion models on consumer GPUs☆38Updated 5 months ago
- This is a fork of the original fairseq repository (version 0.12.2) with added classes for training mHuBERT-147.☆17Updated 5 months ago
- Speaker change detection using SincNet and an LSTM/Transformer☆50Updated 9 months ago
- A python package for whisper normalizer☆55Updated last week
- Speaker diarization service☆21Updated last week
- ☆62Updated 9 months ago
- A minimalistic automatic speech recognition streamlit based webapp powered by OpenAI's Whisper "State of the Art" models☆66Updated 2 years ago
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆62Updated 2 weeks ago
- ☆41Updated 2 months ago
- App to explore latent spaces of music collections☆33Updated 11 months ago
- Multi-Modal Language Modeling with Image, Audio and Text Integration, included multi-images and multi-audio in a single multiturn.☆17Updated last year
- OpenAI Whisper Prompt Examples☆52Updated last year
- The YouTube Text-To-Speech dataset is comprised of waveform audio extracted from YouTube videos alongside their English transcriptions☆51Updated 4 years ago