apple / dmel-demoLinks
dMel: Speech Tokenization Made Simple
☆16Updated 8 months ago
Alternatives and similar repositories for dmel-demo
Users that are interested in dmel-demo are comparing it to the libraries listed below
Sorting:
- ☆53Updated last year
- Official PyTorch implementation of "Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation" (NeurIPS 2024)☆94Updated last year
- Implementation of Acoustic BPE (Shen et al., 2024), extended for RVQ-based Neural Audio Codecs☆77Updated 2 months ago
- [ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer☆67Updated last year
- ☆167Updated last year
- SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model, Accepted to IEEE SLT 2022☆118Updated 3 years ago
- VoiceLDM: Text-to-Speech with Environmental Context☆192Updated last year
- LibriSpeech-Long is a benchmark dataset for long-form speech generation and processing. Released as part of "Long-Form Speech Generation …☆91Updated last year
- Official Implementation of the work "Audio Mamba: Bidirectional State Space Model for Audio Representation Learning"☆167Updated last year
- ☆49Updated 9 months ago
- Official release of StyleTalk dataset.☆72Updated last year
- The TTSDS benchmark evaluates synthetic speech quality by considering prosody, speaker identity, and intelligibility, comparing these fac…☆79Updated 4 months ago
- Implementation of the paper "Self-supervised Learning with Random-projection Quantizer for Speech Recognition" in Pytorch.☆91Updated 2 years ago
- Collection of scripts from mHuBERT-147.☆32Updated last year
- ☆43Updated 5 months ago
- Official code for Wav2Seq☆97Updated 3 years ago
- small audio language model for reasoning☆86Updated 2 months ago
- Putting flows on top of neural transducers for better TTS☆65Updated 3 weeks ago
- AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension☆127Updated last year
- Audiogen Codec☆144Updated last year
- Implementation of SoundStorm built upon SpeechTokenizer.☆116Updated 2 years ago
- DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning☆53Updated 2 years ago
- A toolkit to calculate speech audio quality. Not affiliated with the original authors☆66Updated last year
- Code for NeurIPS 2023 paper "DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation".☆63Updated last year
- Official Code for SyllableLM: Learning Coarse Semantic Units for Speech Language Models☆59Updated 7 months ago
- Transcribing Speech with Multinomial Diffusion, training code and models.☆80Updated 2 years ago
- Implementation of BEST-RQ - a model for self-supervised learning of speech signals using a random projection quantizer, in Pytorch.☆133Updated 2 years ago
- List of direct speech-to-speech translation papers.☆38Updated 3 years ago
- LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning☆159Updated last year
- INTERSPEECH 2023: "DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models"☆116Updated 2 years ago