prompteus / audio-captioning
Audio captioning - DCASE challenge 2023 task 6a
☆21Updated 9 months ago
Related projects ⓘ
Alternatives and complementary repositories for audio-captioning
- Implementation of Acoustic BPE (Shen et al., 2024), extended for RVQ-based Neural Audio Codecs☆47Updated last month
- Torch implementation of Whisper-guided DDPM based Voice Conversion☆49Updated last year
- PyTorch implementation of WaveFit [2022, Google] which is one of SOTA lightweight/fast speech vocoders.☆47Updated last month
- AudioBench: A Universal Benchmark for Audio Large Language Models☆93Updated last week
- Masked Modeling Duo: Towards a Universal Audio Pre-training Framework☆76Updated 3 months ago
- Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dial…☆39Updated last week
- A collection of audio autoencoders, in PyTorch.☆39Updated last year
- An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GP…☆83Updated last month
- Official code for Wav2Seq☆95Updated 2 years ago
- Training code and trained checkpoints for ASGAN.☆60Updated 10 months ago
- High-Fidelity Neural Phonetic Posteriorgrams☆98Updated 2 weeks ago
- Official implementation of MelHuBERT☆65Updated 3 weeks ago
- ☆31Updated 2 weeks ago
- Official code for Interspeech 2023 paper "Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clusterin…☆44Updated last year
- VoiceLDM: Text-to-Speech with Environmental Context☆163Updated 3 months ago
- Official implementation of the paper "Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus" acc…☆71Updated last year
- This is the official implementation of our multi-channel multi-speaker multi-spatial neural audio codec architecture.☆42Updated 2 months ago
- ☆32Updated 2 months ago
- Official implementation for the paper Fine-grained style control in transformer-based text-to-speech synthesis.☆87Updated 2 years ago
- Zero-Shot Foreign Accent Conversion without a Native Reference☆28Updated 6 months ago
- Pytorch implementation of BigVSAN☆198Updated 7 months ago
- PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis☆56Updated 3 years ago
- An High-resolution implementation of HiFi-GAN Vocoder for Voice Conversion.☆30Updated last year
- UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation☆72Updated 3 years ago
- [Batching/MultiGPU/DataLoader Implemented] Code for the paper Hybrid Spectrogram and Waveform Source Separation☆22Updated last year
- Official implementation of Vec-Tok Speech☆93Updated last year
- A sequence-to-sequence voice conversion toolkit.☆86Updated 4 months ago
- Denoising Diffusion Autoregressive Model for Raw Speech Waveform Generation☆23Updated 8 months ago
- Unsupervised Rhythm Modeling for Voice Conversion☆80Updated last year
- Official Implementation of StyleTTS-VC☆164Updated last year