SELMA-project / ml4audioLinks
audio, NLP, ML with huggingface, nvidia/nemo, speechbrain
☆11Updated last year
Alternatives and similar repositories for ml4audio
Users that are interested in ml4audio are comparing it to the libraries listed below
Sorting:
- ☆15Updated 3 months ago
- Easily turn large sets of audio urls to an audio dataset.☆21Updated 2 years ago
- Audio processing using deep neural networks. Speaker identification using voice embeddings.☆13Updated 2 years ago
- An open source NLP as a service project focused on providing state of the art systems with ease. Training and inference by simple docker …☆20Updated 9 months ago
- ☆23Updated 2 years ago
- Identifying individual speakers in an audio stream based on the unique characteristics found in individual voices using Python☆18Updated 2 years ago
- Generate audio datasets for training Text-To-Speech models, through smart audio splitting with silence detection, and transcription using…☆28Updated 2 years ago
- Implementation of SoundtStream from the paper: "SoundStream: An End-to-End Neural Audio Codec"☆12Updated 4 months ago
- Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages☆13Updated 2 years ago
- Minimal, clean code for video/image "patchnization" - a process commonly used in tokenizing visual data for use in a Transformer encoder.…☆11Updated last year
- Interface for using TTS and vocoder models in the form of a text editor☆20Updated 2 years ago
- ☆20Updated 3 years ago
- Finally, some decent sample sentences☆23Updated last year
- KATube is a tool to automate the process of creating datasets for training Text-To-Speech (TTS) and Speech-To-Text (STT) models. From a l…☆23Updated 10 months ago
- Describe the format of image/text datasets☆11Updated 3 years ago
- Repository for fine-tuning Transformers 🤗 based seq2seq speech models in JAX/Flax.☆36Updated 2 years ago
- BUD-E (Buddy) is an open-source voice assistant framework that facilitates seamless interaction with AI models and APIs, enabling the cre…☆20Updated 8 months ago
- Lyra V2 (SoundStream) running in the browser☆19Updated last year
- ☆11Updated 10 years ago
- A simple uv workspace☆12Updated 2 months ago
- Cog wrapper for collabora/WhisperSpeech☆25Updated last year
- Putting flows on top of neural transducers for better TTS☆62Updated this week
- ☆14Updated last year
- GPT-jax based on the official huggingface library☆13Updated 4 years ago
- Rust bindings for CTranslate2☆14Updated 2 years ago
- ☆15Updated 3 years ago
- Zero-shot Audio Classification using Whisper☆79Updated 2 years ago
- Simple script to re-rank images using OpenAI's CLIP https://github.com/openai/CLIP.☆16Updated 4 years ago
- Speaker diarization service☆23Updated 2 months ago
- Using short models to classify long texts☆21Updated 2 years ago