ituvisionlab / EdVAE
Official PyTorch implementation of "EdVAE: Mitigating Codebook Collapse with Evidential Discrete Variational Autoencoders"
β11Updated 7 months ago
Alternatives and similar repositories for EdVAE
Users that are interested in EdVAE are comparing it to the libraries listed below
Sorting:
- An ODE-based generative neural vocoder using Rectified Flowβ60Updated 2 years ago
- [Official Implementation] Acoustic Autoregressive Modeling π₯β67Updated 8 months ago
- A neural speech codec based on discrete WavLM representationsβ24Updated 8 months ago
- This is the official train-dev-test release of the Interspeech2024 Discrete Speech Representation Challenge.β32Updated last year
- A spoken version of the textual story cloze benchmarkβ17Updated last year
- Event Relation in Text-to-Audio (TTA) Generationβ17Updated 2 months ago
- Official repo of ICASSP 2024 paper - Generative De-Quantization for Neural Speech Codec via Latent Diffusion.β55Updated 4 months ago
- β23Updated 7 months ago
- A toolkit for researchers in the multimodal sound separation.β16Updated last year
- PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervβ¦β36Updated last year
- Source code for DM-Codec.β41Updated 6 months ago
- β25Updated 9 months ago
- The demo page for ALMTokenizerβ48Updated last month
- A lightweight audio codec based on a single quantizerβ58Updated last month
- Python scripts to create noisy and reverberant 2-speaker mixture audio with Libri-Light and WHAMβ16Updated 6 months ago
- (ICASSP 2025) Learning Source Disentanglement in Neural Audio Codecβ33Updated this week
- Variable Bitrate Residual Vector Quantization for Audio Codingβ41Updated 2 weeks ago
- Learning and controlling the source-filter representation of speech with a variational autoencoderβ45Updated 2 years ago
- [INTERSPEECH 2023 Best Paper Shortlist] Official implementation for MT4SSL: Boosting Self-Supervised Speech Representation Learning by Inβ¦β44Updated last year
- β61Updated last year
- Viterbi decoding in PyTorchβ32Updated last month
- A minimal Pytorch Implementation of Stochastically Quantized Variational AutoEncoder (SQ-VAE) by Sonyβ31Updated last year
- LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation (INTERSPEECH 2024)β38Updated 11 months ago
- Please visit https://thuhcsi.github.io/SnakeGAN/β36Updated 2 years ago
- β13Updated 2 months ago
- [ICASSP2025] Official code for VoiceDiT: Dual-Condition Diffusion Transformer for Environment-Aware Speech Synthesisβ13Updated last month
- Aty-TTS: Improving fairness for spoken language understanding in atypical speech with Text-to-Speechβ10Updated last year
- SRTNetβ24Updated 2 years ago
- β46Updated 4 months ago
- Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986β45Updated 7 months ago