Bai-YT / ConsistencyTTA
ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation
☆32Updated this week
Related projects ⓘ
Alternatives and complementary repositories for ConsistencyTTA
- Unofficial download repository for MusicCaps☆44Updated last year
- A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline☆67Updated 2 weeks ago
- ☆34Updated 5 months ago
- ☆79Updated last year
- Open implementation of UNIVERSE and UNIVERSE++ diffusion-based speech enhancement models.☆72Updated 2 months ago
- Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'☆88Updated 4 months ago
- PAM is a no-reference audio quality metric for audio generation tasks☆49Updated 4 months ago
- [InterSpeech'2024] FluentEditor:Text-based Speech Editing by Considering Acoustic and Prosody Consistency☆48Updated last month
- Official Implementation of EnCLAP (ICASSP 2024)☆90Updated 5 months ago
- This is the official implementation of our multi-channel multi-speaker multi-spatial neural audio codec architecture.☆42Updated 2 months ago
- ☆40Updated 5 months ago
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆67Updated last week
- ☆51Updated 3 weeks ago
- ☆45Updated this week
- SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis☆99Updated 3 weeks ago
- The open source code for SimpleSpeech series☆111Updated last month
- Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"☆84Updated 2 months ago
- An official implementation of the ICASSP 2024 paper: Dual-Path TFC-TDF UNet for Music Source Separation☆81Updated 8 months ago
- ☆20Updated 10 months ago
- ☆47Updated last week
- Contains the code associated with the ICLR submission for our text-to-speech diffusion model☆50Updated last year
- Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model☆104Updated last month
- ☆55Updated 11 months ago
- High fidelity, lightweight, end-to-end, streaming, convolution-based neural audio codec☆84Updated last month
- Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986☆37Updated last month
- Unofficial SoundStream implementation of Pytorch with training code and 16kHz pretrained checkpoint☆58Updated last year
- Zero-Shot Emotion Style Transfer☆37Updated 7 months ago
- Implementation of Multi-Source Music Generation with Latent Diffusion.☆18Updated 2 months ago
- PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Superv…☆31Updated 10 months ago
- ☆58Updated 2 months ago