Stability-AI / stable-audio-tools
Generative models for conditional audio generation
☆2,833Updated last week
Alternatives and similar repositories for stable-audio-tools:
Users that are interested in stable-audio-tools are comparing it to the libraries listed below
- Text-to-Audio/Music Generation☆2,355Updated 3 months ago
- Official implementation of "Separate Anything You Describe"☆1,670Updated last month
- AI powered speech denoising and enhancement☆1,581Updated last month
- A webui for different audio related Neural Networks☆1,110Updated 5 months ago
- TTS Generation Web UI (Bark, MusicGen + AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNet, StyleTTS2, MMS, Stable Audio, Mars5,…☆1,947Updated last month
- A family of diffusion models for text-to-audio generation.☆1,129Updated 2 weeks ago
- AudioLDM: Generate speech, sound effects, music and beyond, with text.☆2,527Updated last month
- Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor…☆580Updated 5 months ago
- zero-shot voice conversion & singing voice conversion, with real-time support☆904Updated last week
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆5,265Updated 5 months ago
- [arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis☆966Updated this week
- A simple, high-quality voice conversion tool focused on ease of use and performance.☆1,984Updated this week
- Stable diffusion for real-time music generation☆3,469Updated 5 months ago
- Versatile audio super resolution (any -> 48kHz) with AudioSR.☆1,253Updated last week
- Foundational model for human-like, expressive TTS☆3,979Updated 5 months ago
- The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.☆5,547Updated 6 months ago
- Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch☆2,479Updated this week
- Lumina-T2X is a unified framework for Text to Any Modality Generation☆2,126Updated 5 months ago
- Apply diffusion models using the new Hugging Face diffusers package to synthesize music instead of images.☆734Updated 3 months ago
- Contrastive Language-Audio Pretraining☆1,500Updated last month
- [ECCV 2024, Oral] DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors☆2,717Updated 4 months ago
- State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.☆1,253Updated 6 months ago
- Audio generation using diffusion models, in PyTorch.☆2,002Updated last year
- The best OSS video generation models☆2,718Updated last week
- AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of adv…☆1,320Updated last week
- Implementation of MusicLM, Google's new SOTA model for music generation using attention networks, in Pytorch☆3,209Updated last year
- MusePose: a Pose-Driven Image-to-Video Framework for Virtual Human Generation☆2,389Updated 5 months ago
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆8,947Updated this week
- High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance☆2,092Updated 3 months ago
- Inference and training library for high-quality TTS models.☆4,910Updated last month