jack-tol / youtube-to-audioLinks
A lightweight Python package and command-line interface (CLI) tool that extracts audio from YouTube videos and playlists in multiple formats, such as MP3, WAV, OGG, AAC, and FLAC.
☆13Updated 6 months ago
Alternatives and similar repositories for youtube-to-audio
Users that are interested in youtube-to-audio are comparing it to the libraries listed below
Sorting:
- PlayHT Python SDK - AI Text-to-Speech Streaming & Voice Cloning API☆216Updated last month
- Video+code lecture on building nanoGPT from scratch☆69Updated last year
- Speaker Diarization with Transformers☆69Updated 3 months ago
- ☆62Updated last year
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆68Updated last week
- Efficient approach to speaker diarization using voice characteristics extraction☆100Updated 3 months ago
- Speech recognition & diarisation solution with text alignment, deployed in AML pipelines☆97Updated last year
- ☆127Updated 5 months ago
- Joint speech-language model - respond directly to audio!☆372Updated last year
- 🐍 🤖 Pip installable package for StyleTTS 2 human-level text-to-speech and voice cloning☆157Updated last year
- ☆262Updated last year
- The next evolution of Agents☆47Updated this week
- ☆158Updated 2 years ago
- This public GitHub repository contains code for a fully self-hosted, on-premise transcription solution.☆54Updated 9 months ago
- Collection of Open Source Speech Data☆160Updated this week
- Open TTS models, built for streaming on the edge☆42Updated 6 months ago
- Sing an idea ➡️ AI music sample🔥🎶☆118Updated last year
- ☆37Updated last year
- ☆207Updated last year
- ☆246Updated 3 weeks ago
- a simple system for 2-way interruptible voice interactions between human and LLM☆30Updated last year
- Arxflix turns your boring Arxiv research paper into a captivating video.☆54Updated 2 weeks ago
- Chat to Compose Video☆195Updated last year
- Voxtral: Convert Mistral into a end2end SpeechLM. No information bottleneck, preserves prosody, learns interruptions from data. Unlike GP…☆30Updated 6 months ago
- A WebRTC server that allows you to interact with an LLM using your speech and responds back with generated audio.☆136Updated last year
- Maybe the new state of the art vision model? we'll see 🤷♂️☆166Updated last year
- ☆175Updated last year
- Joint speech-language model - respond directly to audio!☆30Updated last year
- ☆24Updated last year
- ☆116Updated 9 months ago