ituvisionlab / EdVAE
Official PyTorch implementation of "EdVAE: Mitigating Codebook Collapse with Evidential Discrete Variational Autoencoders"
☆10Updated 4 months ago
Alternatives and similar repositories for EdVAE:
Users that are interested in EdVAE are comparing it to the libraries listed below
- A spoken version of the textual story cloze benchmark☆14Updated last year
- Source code for DM-Codec.☆34Updated 3 months ago
- Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986☆41Updated 3 months ago
- A neural speech codec based on discrete WavLM representations☆22Updated 4 months ago
- Learning and controlling the source-filter representation of speech with a variational autoencoder☆45Updated last year
- PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Superv…☆36Updated last year
- This is the official train-dev-test release of the Interspeech2024 Discrete Speech Representation Challenge.☆32Updated 11 months ago
- [Official Implementation] Acoustic Autoregressive Modeling 🔥☆60Updated 4 months ago
- [InterSpeech'2024] FluentEditor:Text-based Speech Editing by Considering Acoustic and Prosody Consistency☆49Updated 2 months ago
- (ICASSP 2025) Learning Source Disentanglement in Neural Audio Codec☆21Updated 3 weeks ago
- An ODE-based generative neural vocoder using Rectified Flow☆61Updated last year
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆51Updated 2 months ago
- Please visit https://thuhcsi.github.io/SnakeGAN/☆36Updated last year
- ☆37Updated 7 months ago
- Baseline for DCASE 2024 Task 9: "Language-Queried Audio Source Separation"☆22Updated 9 months ago
- LLaSA: Scaling Train Time and Test Time Compute for LLaMA based Speech Synthesis☆24Updated this week
- Streaming Vocos☆19Updated last week
- A toolkit for researchers in the multimodal sound separation.☆16Updated last year
- WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching☆34Updated last month
- DOSE: Diffusion Dropout with Adaptive Prior for Speech Enhancement, Conference on Neural Information Processing Systems (NeurIPS), 2023☆44Updated last week
- ☆21Updated last year
- Event Relation in Text-to-Audio (TTA) Generation☆17Updated this week
- A minimal Pytorch Implementation of Stochastically Quantized Variational AutoEncoder (SQ-VAE) by Sony☆30Updated last year
- Code for the paper "FLowHigh: Towards efficient and high-quality audio super-resolution with single-step flow matching"☆49Updated this week
- ASiT: Audio Spectrogram vIsion Transformer for General Audio Representation☆22Updated 10 months ago
- Python scripts to create noisy and reverberant 2-speaker mixture audio with Libri-Light and WHAM☆15Updated 2 months ago
- ☆15Updated 6 months ago
- Official repo of ICASSP 2024 paper - Generative De-Quantization for Neural Speech Codec via Latent Diffusion.☆48Updated last week