retkowsky / audio_embeddings
Audio search using Azure Cognitive Search
☆22Updated last year
Alternatives and similar repositories for audio_embeddings:
Users that are interested in audio_embeddings are comparing it to the libraries listed below
- Open TTS models, built for streaming on the edge☆39Updated 2 weeks ago
- Joint speech-language model - respond directly to audio!☆30Updated 10 months ago
- Multi-Modal Language Modeling with Image, Audio and Text Integration, included multi-images and multi-audio in a single multiturn.☆17Updated last year
- Speaker Diarization with Transformers☆64Updated 10 months ago
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆60Updated 2 weeks ago
- This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.☆27Updated last year
- Audio tokenization, in the fastest way possible!☆49Updated 7 months ago
- ☆84Updated last week
- A lightweight Python library for running TTS models with a unified API.☆17Updated last month
- Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor…☆57Updated 11 months ago
- ☆10Updated 11 months ago
- a simple system for 2-way interruptible voice interactions between human and LLM☆23Updated last year
- Final training script from HuggingFace Whisper Fine tuning event - to get best results on finetuned model.☆12Updated 2 years ago
- Repository contains code to fine-tune WhisperASR model☆23Updated 2 years ago
- Agentic RAG to help you build a startup🚀☆16Updated 2 weeks ago
- Scripts for computing the Intelligibility and CLVP scores for evaluating TTS models☆152Updated last year
- A python package for whisper normalizer☆53Updated 3 weeks ago
- Repository for fine-tuning Transformers 🤗 based seq2seq speech models in JAX/Flax.☆35Updated 2 years ago
- ☆43Updated last month
- Using short models to classify long texts☆21Updated 2 years ago
- Build Agentic workflows with function calling using open LLMs☆26Updated this week
- ☆62Updated 8 months ago
- ☆14Updated 6 months ago
- Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation☆143Updated last year
- Create an LJSpeech structured voice dataset on wave input☆27Updated 6 months ago
- Audio processing using deep neural networks. Speaker identification using voice embeddings.☆13Updated 2 years ago
- ☆84Updated 11 months ago
- Consists of the largest (10K) human annotated code-switched semantic parsing dataset & 170K generated utterance using the CST5 augmentati…☆37Updated 2 years ago
- The official implementation of our paper "Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tu…☆81Updated 6 months ago
- ☆45Updated 2 years ago