Official Pytorch Implementation of Our CVPR2023 Paper: "Not All Image Regions Matter: Masked Vector Quantization for Autoregressive Image Generation"
☆64Jul 21, 2023Updated 2 years ago
Alternatives and similar repositories for MaskedVectorQuantization
Users that are interested in MaskedVectorQuantization are comparing it to the libraries listed below
Sorting:
- Official Pytorch Implementation of Our CVPR2023 Paper: "Towards Accurate Image Coding: Improved Autoregressive Image Generation with Dyna…☆192Jul 23, 2023Updated 2 years ago
- SLMTokBench for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"☆37Aug 29, 2023Updated 2 years ago
- Rate-Adaptive Quantization: A Multi-Rate Codebook Adaptation for Vector Quantization-based Generative Models☆15Sep 10, 2025Updated 5 months ago
- ☆13Mar 11, 2025Updated 11 months ago
- This is the official implementation of our multi-channel multi-speaker multi-spatial neural audio codec architecture.☆51Mar 17, 2025Updated 11 months ago
- Implementation of Acoustic BPE (Shen et al., 2024), extended for RVQ-based Neural Audio Codecs☆77Dec 3, 2025Updated 3 months ago
- [CVPR 2024] Code and datasets for 'Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos'☆13Jun 16, 2024Updated last year
- Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"☆11Apr 10, 2025Updated 10 months ago
- [ICCV 2023] Online Clustered Codebook☆183Sep 19, 2024Updated last year
- A neural speech codec based on discrete WavLM representations☆24Aug 28, 2024Updated last year
- The open source code for SimpleSpeech series☆145Oct 8, 2024Updated last year
- Automatic speech annotator processing speech with voice activaty detection, overlapping speech detection, speaker diarization and automat…☆33Jun 14, 2024Updated last year
- ☆24Sep 10, 2025Updated 5 months ago
- ☆12Jul 23, 2024Updated last year
- ☆15Sep 22, 2025Updated 5 months ago
- Whisper Speech Quality Assessment (WhiSQA)☆16Oct 14, 2025Updated 4 months ago
- VAE modified from Descript Audio Codec, which replaces the RVQ with VAE☆88Apr 2, 2024Updated last year
- Official code for SongEcho☆41Feb 21, 2026Updated last week
- This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Models☆35Oct 13, 2024Updated last year
- This is the repository for the work "BridgeVoC: Revitalizing Neural Vocoder from a Restoration Perspective".☆64Nov 5, 2025Updated 4 months ago
- ICASSP 2024 - Generative De-Quantization for Neural Speech Codec via Latent Diffusion.☆55Nov 16, 2025Updated 3 months ago
- ☆13Sep 23, 2023Updated 2 years ago
- Forced alignment decoder for Whisper.☆14Mar 13, 2024Updated last year
- A spoken version of the textual story cloze benchmark☆20Aug 6, 2023Updated 2 years ago
- Python scripts to create noisy and reverberant 2-speaker mixture audio with Libri-Light and WHAM☆17Nov 7, 2024Updated last year
- AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data☆35Dec 31, 2023Updated 2 years ago
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆62Nov 1, 2024Updated last year
- Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"☆213Sep 19, 2024Updated last year
- ☆31Jul 13, 2023Updated 2 years ago
- A low-bitrate single-codebook 16 / 24 kHz speech codec based on focal modulation☆145Nov 30, 2025Updated 3 months ago
- This repo contains the official PyTorch implementation of "Analyzing Discrete Self Supervised Speech Representation For Spoken Language M…☆20Jan 3, 2023Updated 3 years ago
- Learnable Gammatone Filterbank (LGTFB) and Equal-loudness Normalization (EN)☆12Apr 24, 2020Updated 5 years ago
- Descript Audio Codec - VAE Variant (.dac-vae): High-Fidelity Audio Compression with Variational Autoencoder☆31Aug 30, 2025Updated 6 months ago
- A real time implementation of the ddsp from google magenta.☆15Nov 8, 2021Updated 4 years ago
- a Neural Vocoder supporting Ring Attention, Conformer and NSF.☆24Aug 1, 2025Updated 7 months ago
- ☆17Apr 5, 2024Updated last year
- A partial implementation of Generative Infinite Vocabulary Transformer (GIVT) from Google Deepmind, in PyTorch.☆21Mar 28, 2024Updated last year
- Experimental implementation for a sparse-dictionary based version of the VQ-VAE2 paper☆35Oct 27, 2023Updated 2 years ago
- ACM MM 2023 CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model☆213Apr 26, 2024Updated last year