ituvisionlab / EdVAELinks
Official PyTorch implementation of "EdVAE: Mitigating Codebook Collapse with Evidential Discrete Variational Autoencoders"
☆11Updated 9 months ago
Alternatives and similar repositories for EdVAE
Users that are interested in EdVAE are comparing it to the libraries listed below
Sorting:
- ☆23Updated 8 months ago
- A neural speech codec based on discrete WavLM representations☆24Updated 9 months ago
- This is the official train-dev-test release of the Interspeech2024 Discrete Speech Representation Challenge.☆32Updated last year
- A spoken version of the textual story cloze benchmark☆17Updated last year
- Source code for DM-Codec.☆45Updated 3 weeks ago
- The open source code of ALMTokenizer2: Towards Low bit-rate and Semantic-rich Audio Tokenizer with Flow-based Scalar Diffusion Transforme…☆26Updated last month
- [Official Implementation] Acoustic Autoregressive Modeling 🔥☆70Updated 10 months ago
- PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Superv…☆36Updated last year
- A toolkit for researchers in the multimodal sound separation.☆16Updated last year
- [ICASSP2025] Official code for VoiceDiT: Dual-Condition Diffusion Transformer for Environment-Aware Speech Synthesis☆18Updated 2 months ago
- Official repo of ICASSP 2024 paper - Generative De-Quantization for Neural Speech Codec via Latent Diffusion.☆55Updated 5 months ago
- SRTNet☆24Updated 2 years ago
- Python scripts to create noisy and reverberant 2-speaker mixture audio with Libri-Light and WHAM☆16Updated 7 months ago
- (ICASSP 2025) Learning Source Disentanglement in Neural Audio Codec☆34Updated last month
- An ODE-based generative neural vocoder using Rectified Flow☆59Updated 2 years ago
- Whisper Speech Quality Assessment (WhiSQA)☆9Updated 6 months ago
- Event Relation in Text-to-Audio (TTA) Generation☆19Updated 4 months ago
- [INTERSPEECH 2025] Official code for "SEED: Speaker Embedding Enhancement Diffusion"☆13Updated last month
- The demo page for ALMTokenizer☆51Updated 2 months ago
- Official repository for the paper "Audio xLSTMs: Learning Self-supervised audio representations with xLSTMs"☆16Updated 9 months ago
- Collection of works for evaluating (and analyzing) large audio-language models (LALMs)☆28Updated last week
- ☆13Updated 3 months ago
- Code for ICLR 2024 Paper: CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models☆18Updated 11 months ago
- Learning and controlling the source-filter representation of speech with a variational autoencoder☆45Updated 2 years ago
- A minimal Pytorch Implementation of Stochastically Quantized Variational AutoEncoder (SQ-VAE) by Sony☆31Updated last year
- Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986☆47Updated 8 months ago
- Official repository for the paper Singing Voice Graph Modeling for SingFake Detection (Interspeech 2024).☆25Updated 8 months ago
- ☆26Updated 10 months ago
- Benchmark for evaluating TTS models on complex prosodic, expressiveness, and linguistic challenges.☆45Updated 3 weeks ago
- Sequence alignement methods with helpers for PyTorch.☆24Updated 2 years ago