Implementation of the paper "Self-supervised Learning with Random-projection Quantizer for Speech Recognition" in Pytorch.
☆91May 25, 2023Updated 2 years ago
Alternatives and similar repositories for BEST-RQ
Users that are interested in BEST-RQ are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Implementation of BEST-RQ - a model for self-supervised learning of speech signals using a random projection quantizer, in Pytorch.☆133Sep 25, 2023Updated 2 years ago
- BigVGAN with Neural Source-Filter☆56Sep 21, 2023Updated 2 years ago
- E2E TTS using Conditional Flow Matching (Experimental*)☆71Nov 10, 2023Updated 2 years ago
- ☆25Mar 12, 2022Updated 4 years ago
- [AAAI 2024] Code for CTX-vec2wav in UniCATS☆130Jun 11, 2024Updated last year
- Implementation of TTS model based on NVIDIA P-Flow TTS Paper☆77May 12, 2024Updated last year
- Official implementation for the paper: A Unified One-Shot Prosody and Speaker Conversion System with Self-Supervised Discrete Speech Unit…☆83Jan 7, 2023Updated 3 years ago
- Unofficial pytorch reproduction for the paper "Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction" (…☆61Apr 4, 2024Updated last year
- [EMNLP 2024] ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers☆125Mar 20, 2025Updated last year
- ☆46Apr 16, 2023Updated 2 years ago
- VAE modified from Descript Audio Codec, which replaces the RVQ with VAE☆90Apr 2, 2024Updated last year
- Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing☆71Dec 2, 2022Updated 3 years ago
- Official implementation of the paper "Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus" acc…☆77Jul 16, 2023Updated 2 years ago
- Speaker embedding for VI-SVC and VI-SVS, alse for VITS; Use this to replace the ID to implement voice clone.☆30Sep 16, 2022Updated 3 years ago
- [ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"☆372Sep 3, 2024Updated last year
- The demo page for ALMTokenizer☆59Apr 14, 2025Updated 11 months ago
- Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)☆152Sep 14, 2023Updated 2 years ago
- ☆51Mar 5, 2026Updated 2 weeks ago
- It's a repository for implementations of neural speech editing algorithms.☆204Jan 9, 2024Updated 2 years ago
- [InterSpeech'2024] FluentEditor:Text-based Speech Editing by Considering Acoustic and Prosody Consistency☆60Oct 23, 2024Updated last year
- An Open-source Streaming High-fidelity Neural Audio Codec☆500Mar 4, 2025Updated last year
- Try to replicate the architecture of MiniMaxTTS mentioned in it's technical report☆48Sep 2, 2025Updated 6 months ago
- A robust pitch tracker using synchro-squeezed fft and frequency domain autocorrelation☆36Jan 17, 2024Updated 2 years ago
- Implementation of Acoustic BPE (Shen et al., 2024), extended for RVQ-based Neural Audio Codecs☆76Dec 3, 2025Updated 3 months ago
- Variable Bitrate Residual Vector Quantization for Audio Coding☆50May 1, 2025Updated 10 months ago
- Contains the code associated with the ICLR submission for our text-to-speech diffusion model☆57Oct 31, 2023Updated 2 years ago
- Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3☆435Sep 13, 2024Updated last year
- ☆69Jul 29, 2023Updated 2 years ago
- [ACMMM'2024] Generative Expressive Conversational Speech Synthesis☆44Oct 28, 2024Updated last year
- AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model☆295Oct 12, 2025Updated 5 months ago
- Streamable Text-to-Speech model using a language modeling approach, without vector quantization☆110May 20, 2025Updated 10 months ago
- This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples a…☆650Jun 9, 2024Updated last year
- Official repository for "Speaking Style Conversion With Discrete Self-Supervised Units" (EMNLP 2023). https://arxiv.org/abs/2212.09730☆131Dec 8, 2023Updated 2 years ago
- Syllable Segmentation and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model☆34Aug 27, 2023Updated 2 years ago
- ☆100Jan 19, 2026Updated 2 months ago
- Findings of ACL 2023 | AlignSTS: a speech-to-singing (STS) model based on modality disentanglement and cross-modal alignment☆68Jul 5, 2024Updated last year
- My hybrid TTS network that combines, VALL-E, VoiceBox, SpeechFlow, Seamless and TortoiseTTS into one☆26Aug 5, 2024Updated last year
- Official implementation of "Avocodo: Generative Adversarial Network for Artifact-Free Vocoder" (AAAI2023)☆154Feb 1, 2023Updated 3 years ago
- Yin pitch estimator in PyTorch☆117Nov 7, 2022Updated 3 years ago