SELMA-project / ml4audioLinks
audio, NLP, ML with huggingface, nvidia/nemo, speechbrain
☆11Updated last year
Alternatives and similar repositories for ml4audio
Users that are interested in ml4audio are comparing it to the libraries listed below
Sorting:
- ☆15Updated 2 months ago
- A lightweight Python library for running TTS models with a unified API.☆18Updated 3 months ago
- Easily turn large sets of audio urls to an audio dataset.☆21Updated 2 years ago
- Identifying individual speakers in an audio stream based on the unique characteristics found in individual voices using Python☆18Updated last year
- ☆22Updated last year
- Rust bindings for CTranslate2☆14Updated last year
- An open source NLP as a service project focused on providing state of the art systems with ease. Training and inference by simple docker …☆20Updated 8 months ago
- Generate audio datasets for training Text-To-Speech models, through smart audio splitting with silence detection, and transcription using…☆28Updated 2 years ago
- ☆16Updated 5 years ago
- Parallelized automatic corpus collection for ASR. Forked from https://github.com/EgorLakomkin/KTSpeechCrawler☆24Updated 4 years ago
- Experiments with generating GPT-2 fanfiction on specified topics.☆11Updated 6 years ago
- From a large speech audio file and its corresponding body of text, automatically chunk the audio and text into (phrase, audio_snippet) pa…☆17Updated 10 years ago
- Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages☆13Updated 2 years ago
- Minimal, clean code for video/image "patchnization" - a process commonly used in tokenizing visual data for use in a Transformer encoder.…☆11Updated last year
- A 🔥 cookiecutter template for building Hugging Face Spaces☆11Updated 3 years ago
- Implementation of SoundtStream from the paper: "SoundStream: An End-to-End Neural Audio Codec"☆12Updated 4 months ago
- Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor…☆19Updated last year
- Cog wrapper for collabora/WhisperSpeech☆24Updated last year
- [NCMMSC'2024] Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech☆22Updated 9 months ago
- Self-supervised neural network for music recommendations.☆18Updated last year
- This is not remotely close to a finished product, and does not intend to nor does this claim to be working fine-tuning code for MaskGCT. …☆12Updated 6 months ago
- This repository contains the implementation of the paper: "Span Classification with Structured Information for Disfluency Detection in Sp…☆13Updated last year
- I have created a dataset of Image-Text-Pairs by using the cosine similarity of the CLIP embeddings of the image & it's caption derrived f…☆15Updated 4 years ago
- Floral Diffusion is a custom diffusion model trained by jags using a DD 5.6 version☆25Updated 2 years ago
- 🦖 X—LLM: Simple & Cutting Edge LLM Finetuning☆11Updated last year
- Melody Lyric Transformer Implementation and Model☆10Updated 2 years ago
- Final training script from HuggingFace Whisper Fine tuning event - to get best results on finetuned model.☆12Updated 2 years ago
- Karras et al. (2022) diffusion models for PyTorch☆17Updated last year
- Audio tokenization, in the fastest way possible!☆52Updated 9 months ago
- KATube is a tool to automate the process of creating datasets for training Text-To-Speech (TTS) and Speech-To-Text (STT) models. From a l…☆23Updated 10 months ago