Layer-wise analysis of self-supervised pre-trained speech representations
☆127Oct 18, 2024Updated last year
Alternatives and similar repositories for layerwise-analysis
Users that are interested in layerwise-analysis are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Word Discovery in Visually Grounded, Self-Supervised Speech Models☆26Dec 4, 2023Updated 2 years ago
- ☆31Jul 13, 2023Updated 2 years ago
- Official implementation of MelHuBERT☆69Feb 21, 2026Updated last month
- ☆13Sep 25, 2024Updated last year
- (Interspeech 2023 & ICASSP 2024) Official repository for ARMHuBERT and STaRHuBERT☆41Aug 29, 2024Updated last year
- **Interspeech 2022** 《SpeechPrompt: An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks》Speec…☆101Apr 10, 2025Updated 11 months ago
- Official implementation of the paper "Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus" acc…☆77Jul 16, 2023Updated 2 years ago
- Syllable Segmentation and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model☆34Aug 27, 2023Updated 2 years ago
- Self-Supervised Speech Pre-training and Representation Learning Toolkit☆2,538Mar 12, 2026Updated last week
- [INTERSPEECH 2025 Oral]Official code for "Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment"☆64Jun 16, 2025Updated 9 months ago
- Collection of scripts from mHuBERT-147.☆32Nov 19, 2024Updated last year
- ASR text preprocessing utility☆21Aug 5, 2024Updated last year
- ☆19Apr 28, 2023Updated 2 years ago
- Transformer-based visually grounded speech models☆19Sep 22, 2022Updated 3 years ago
- Public Code for the paper MAE-AST: Masked Autoencoding Audio Spectrogram Transformer☆92Jun 9, 2022Updated 3 years ago
- [NeurIPS 2022] "Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Spee…☆17Sep 19, 2023Updated 2 years ago
- The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.☆183Feb 28, 2026Updated 3 weeks ago
- Sylber: Syllabic Embedding Representation of Speech from Raw Audio☆74Mar 17, 2025Updated last year
- Implementation of SoundStorm built upon SpeechTokenizer.☆116Nov 2, 2023Updated 2 years ago
- Speaker-aware CTC (SACTC) for multi-talker overlapped speech recognition.☆22May 26, 2025Updated 9 months ago
- LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT☆74Sep 26, 2022Updated 3 years ago
- PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-To-Speech Using Natural Language Descriptions☆85Oct 11, 2024Updated last year
- ☆46Feb 16, 2023Updated 3 years ago
- S3PRL-VC: A Voice Conversion Toolkit based on S3PRL☆101Mar 15, 2026Updated last week
- This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples a…☆650Jun 9, 2024Updated last year
- ☆15Nov 11, 2024Updated last year
- Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context☆216Sep 10, 2024Updated last year
- (SLT 2024) Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition☆13Oct 22, 2024Updated last year
- speaker-disentangled speech linguistic content quantizer☆24Mar 19, 2025Updated last year
- A toolkit to calculate speech audio quality. Not affiliated with the original authors☆70Aug 13, 2024Updated last year
- A PyTorch implementation of the universal neural vocoder☆67Nov 6, 2020Updated 5 years ago
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆114Jan 28, 2026Updated last month
- Unsupervised phone and word segmentation using dynamic programming on self-supervised VQ features.☆39Mar 4, 2024Updated 2 years ago
- AudioCodec-Hub is a Python library for encoding and decoding audio data, supporting various neural audio codec models☆25Sep 26, 2023Updated 2 years ago
- This is the code for paper: XY-Tokenizer: Mitigating the Semantic-Acoustic Conflict in Low-Bitrate Speech Codecs☆89Sep 19, 2025Updated 6 months ago
- Source code and speech samples for the DSU-AVO paper accepted to INTERSPEECH 2023☆12May 13, 2024Updated last year
- [AAAI 2024] Code for CTX-vec2wav in UniCATS☆130Jun 11, 2024Updated last year
- ☆41May 15, 2023Updated 2 years ago
- The official repository of Dynamic-SUPERB.☆199Jun 24, 2025Updated 8 months ago