Layer-wise analysis of self-supervised pre-trained speech representations
☆127Oct 18, 2024Updated last year
Alternatives and similar repositories for layerwise-analysis
Users that are interested in layerwise-analysis are comparing it to the libraries listed below
Sorting:
- ☆31Jul 13, 2023Updated 2 years ago
- Official implementation of MelHuBERT☆68Feb 21, 2026Updated last week
- Word Discovery in Visually Grounded, Self-Supervised Speech Models☆26Dec 4, 2023Updated 2 years ago
- (Interspeech 2023 & ICASSP 2024) Official repository for ARMHuBERT and STaRHuBERT☆40Aug 29, 2024Updated last year
- Official implementation of the paper "Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus" acc…☆77Jul 16, 2023Updated 2 years ago
- ☆13Sep 25, 2024Updated last year
- Speaker-aware CTC (SACTC) for multi-talker overlapped speech recognition.☆21May 26, 2025Updated 9 months ago
- Implementation of SoundStorm built upon SpeechTokenizer.☆116Nov 2, 2023Updated 2 years ago
- ☆15Nov 11, 2024Updated last year
- PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-To-Speech Using Natural Language Descriptions☆83Oct 11, 2024Updated last year
- speaker-disentangled speech linguistic content quantizer☆24Mar 19, 2025Updated 11 months ago
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆114Jan 28, 2026Updated last month
- A toolkit to calculate speech audio quality. Not affiliated with the original authors☆69Aug 13, 2024Updated last year
- Source code and demo for INTERPSEECH 2023 paper: DuTa-VC: A Duration-aware Typical-to-atypical Voice Conversion Approach with Diffusion P…☆37Dec 5, 2023Updated 2 years ago
- The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.☆183Updated this week
- S3PRL-VC: A Voice Conversion Toolkit based on S3PRL☆101Jun 26, 2024Updated last year
- Collection of scripts from mHuBERT-147.☆32Nov 19, 2024Updated last year
- ASR text preprocessing utility☆21Aug 5, 2024Updated last year
- ☆19Apr 28, 2023Updated 2 years ago
- LoRA-based phoneme/prosody control for LLM-based TTS with no G2P - Lightweight adapter for edit and control the target language's phoneme…☆23Aug 14, 2025Updated 6 months ago
- SpeechGLUE is a speech version of the GLUE benchmark, driven by text-to-speech.☆13Jun 2, 2023Updated 2 years ago
- Source code and speech samples for the DSU-AVO paper accepted to INTERSPEECH 2023☆12May 13, 2024Updated last year
- **Interspeech 2022** 《SpeechPrompt: An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks》Speec…☆101Apr 10, 2025Updated 10 months ago
- ☆19Sep 10, 2024Updated last year
- Official implementation for Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning☆96Nov 20, 2024Updated last year
- This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Models☆35Oct 13, 2024Updated last year
- [AAAI 2024] Code for CTX-vec2wav in UniCATS☆130Jun 11, 2024Updated last year
- Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context☆214Sep 10, 2024Updated last year
- (R&D) Text to speech using phonemes as inputs and audio codec codes as outputs. Loosely based on MegaByte, VALL-E and Encodec.☆48Sep 4, 2023Updated 2 years ago
- EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correction☆269May 19, 2024Updated last year
- INTERSPEECH 2023: "DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models"☆117Jan 26, 2024Updated 2 years ago
- UTokyo-SaruLab MOS Prediction System☆298Feb 23, 2026Updated last week
- [INTERSPEECH 2025 Oral]Official code for "Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment"☆64Jun 16, 2025Updated 8 months ago
- SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis☆147Jan 1, 2025Updated last year
- ☆41May 15, 2023Updated 2 years ago
- Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)☆152Sep 14, 2023Updated 2 years ago
- VAE modified from Descript Audio Codec, which replaces the RVQ with VAE☆88Apr 2, 2024Updated last year
- Onset-and-Offset-Aware Sound Event Detection☆21Feb 10, 2025Updated last year
- Implementation of CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning☆48Nov 8, 2023Updated 2 years ago