SELMA-project / ml4audio
audio, NLP, ML with huggingface, nvidia/nemo, speechbrain
☆11Updated last year
Alternatives and similar repositories for ml4audio:
Users that are interested in ml4audio are comparing it to the libraries listed below
- A lightweight Python library for running TTS models with a unified API.☆17Updated 2 months ago
- ☆11Updated last month
- Rust bindings for CTranslate2☆14Updated last year
- Easily turn large sets of audio urls to an audio dataset.☆21Updated 2 years ago
- I have created a dataset of Image-Text-Pairs by using the cosine similarity of the CLIP embeddings of the image & it's caption derrived f…☆15Updated 4 years ago
- Self-supervised neural network for music recommendations.☆18Updated last year
- ☆23Updated last year
- An open source NLP as a service project focused on providing state of the art systems with ease. Training and inference by simple docker …☆20Updated 7 months ago
- Implementation of SoundtStream from the paper: "SoundStream: An End-to-End Neural Audio Codec"☆12Updated 2 months ago
- Generate accompaniment part with chords using Evolutionary algorithm.☆9Updated 2 years ago
- Audio processing using deep neural networks. Speaker identification using voice embeddings.☆13Updated 2 years ago
- Supervoice Speaker Separation Network☆12Updated 10 months ago
- Identifying individual speakers in an audio stream based on the unique characteristics found in individual voices using Python☆18Updated last year
- A repo with scripts to test and play around with Facebook's recent llama models! 🤗☆28Updated last year
- A project about learning how to synchronize subtitles in movies using machine learning.☆9Updated 2 years ago
- Generate audio datasets for training Text-To-Speech models, through smart audio splitting with silence detection, and transcription using…☆28Updated last year
- Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages☆13Updated 2 years ago
- ☆15Updated last year
- Cog wrapper for collabora/WhisperSpeech☆24Updated last year
- Describe the format of image/text datasets☆11Updated 2 years ago
- [NCMMSC'2024] Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech☆22Updated 8 months ago
- ☆11Updated 9 years ago
- Zeta implementation of a reusable and plug in and play feedforward from the paper "Exponentially Faster Language Modeling"☆16Updated 5 months ago
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆62Updated last week
- Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor…☆19Updated last year
- Official implementations for paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models☆15Updated last year
- ☆12Updated last year
- Open TTS models, built for streaming on the edge☆39Updated last month
- Tools for merging pretrained large language models.☆19Updated 10 months ago
- Contrastive Language-Audio Pretraining☆15Updated 3 years ago