jack-tol / youtube-to-audioLinks
A lightweight Python package and command-line interface (CLI) tool that extracts audio from YouTube videos and playlists in multiple formats, such as MP3, WAV, OGG, AAC, and FLAC.
☆12Updated 3 months ago
Alternatives and similar repositories for youtube-to-audio
Users that are interested in youtube-to-audio are comparing it to the libraries listed below
Sorting:
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆62Updated last month
- ☆62Updated 11 months ago
- Open TTS models, built for streaming on the edge☆43Updated 3 months ago
- Video+code lecture on building nanoGPT from scratch☆68Updated last year
- Speech recognition & diarisation solution with text alignment, deployed in AML pipelines☆95Updated last year
- Trying to build an all in one speech-text language model - a bit like GPT-4o☆22Updated last year
- This public GitHub repository contains code for a fully self-hosted, on-premise transcription solution.☆52Updated 6 months ago
- Automatically cleaning, enhancing, segmenting, filtering, and formatting a dataset to fine tune or train a voice model.☆39Updated 2 weeks ago
- Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in MLX☆20Updated 8 months ago
- High-performance ASR tool using Faster Whisper, supporting custom models, multi-language transcription, and real-time processing feedback…☆10Updated 8 months ago
- ☆107Updated last year
- Examples of apps built with Nendo, the AI Audio Tool Suite☆55Updated last year
- Use quantized versions of Whisper to speed up inference☆12Updated 8 months ago
- 🎙️ Automatically transcribe audio/video into high-quality, speaker-specific Text-To-Speech datasets ✨☆90Updated last month
- create dataset from list of youtube links easily☆20Updated 2 years ago
- Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS (E2 TTS) in MLX☆27Updated 8 months ago
- Speaker Diarization with Transformers☆68Updated 3 weeks ago
- Joint speech-language model - respond directly to audio!☆30Updated last year
- ☆97Updated last year
- Whisper Speaker Identification (WSI), a cutting-edge model for multilingual speaker identification.☆18Updated 3 months ago
- Inference, Fine Tuning and many more recipes with Gemma family of models☆38Updated this week
- Google's SoundStorm: Efficient Parallel Audio Generation☆132Updated last year
- ☆258Updated last year
- ☆39Updated last year
- ☆127Updated 3 months ago
- Sing an idea ➡️ AI music sample🔥🎶☆113Updated last year
- Faster Tortoise inference then Tortoise Fast Fork☆128Updated last year
- Cog wrapper for collabora/WhisperSpeech☆25Updated last year
- Create an LJSpeech structured voice dataset on wave input☆30Updated 9 months ago
- VoiceBox neural network implementation☆109Updated 10 months ago