Layer-wise analysis of self-supervised pre-trained speech representations
☆129Oct 18, 2024Updated last year
Alternatives and similar repositories for layerwise-analysis
Users that are interested in layerwise-analysis are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Word Discovery in Visually Grounded, Self-Supervised Speech Models☆27Dec 4, 2023Updated 2 years ago
- ☆31Jul 13, 2023Updated 2 years ago
- Official implementation of MelHuBERT☆69Feb 21, 2026Updated last month
- ☆13Sep 25, 2024Updated last year
- (Interspeech 2023 & ICASSP 2024) Official repository for ARMHuBERT and STaRHuBERT☆41Aug 29, 2024Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- **Interspeech 2022** 《SpeechPrompt: An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks》Speec…☆102Apr 10, 2025Updated last year
- Official implementation of the paper "Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus" acc…☆77Jul 16, 2023Updated 2 years ago
- Syllable Segmentation and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model☆35Aug 27, 2023Updated 2 years ago
- Self-Supervised Speech Pre-training and Representation Learning Toolkit☆2,543Mar 12, 2026Updated last month
- [INTERSPEECH 2025 Oral]Official code for "Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment"☆64Jun 16, 2025Updated 9 months ago
- Collection of scripts from mHuBERT-147.☆34Nov 19, 2024Updated last year
- ASR text preprocessing utility☆21Aug 5, 2024Updated last year
- ☆19Apr 28, 2023Updated 2 years ago
- Transformer-based visually grounded speech models☆19Sep 22, 2022Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Public Code for the paper MAE-AST: Masked Autoencoding Audio Spectrogram Transformer☆93Jun 9, 2022Updated 3 years ago
- [NeurIPS 2022] "Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Spee…☆17Sep 19, 2023Updated 2 years ago
- The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.☆185Feb 28, 2026Updated last month
- Sylber: Syllabic Embedding Representation of Speech from Raw Audio☆76Mar 17, 2025Updated last year
- Implementation of SoundStorm built upon SpeechTokenizer.☆117Nov 2, 2023Updated 2 years ago
- Speaker-aware CTC (SACTC) for multi-talker overlapped speech recognition.☆22May 26, 2025Updated 10 months ago
- LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT☆74Sep 26, 2022Updated 3 years ago
- PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-To-Speech Using Natural Language Descriptions☆86Oct 11, 2024Updated last year
- ☆46Feb 16, 2023Updated 3 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- S3PRL-VC: A Voice Conversion Toolkit based on S3PRL☆101Mar 15, 2026Updated 3 weeks ago
- This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples a…☆652Jun 9, 2024Updated last year
- ☆15Nov 11, 2024Updated last year
- (SLT 2024) Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition☆13Oct 22, 2024Updated last year
- Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context☆217Sep 10, 2024Updated last year
- speaker-disentangled speech linguistic content quantizer☆25Mar 19, 2025Updated last year
- A toolkit to calculate speech audio quality. Not affiliated with the original authors☆72Aug 13, 2024Updated last year
- A PyTorch implementation of the universal neural vocoder☆67Nov 6, 2020Updated 5 years ago
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆115Jan 28, 2026Updated 2 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Unsupervised phone and word segmentation using dynamic programming on self-supervised VQ features.☆39Mar 4, 2024Updated 2 years ago
- AudioCodec-Hub is a Python library for encoding and decoding audio data, supporting various neural audio codec models☆25Sep 26, 2023Updated 2 years ago
- This is the code for paper: XY-Tokenizer: Mitigating the Semantic-Acoustic Conflict in Low-Bitrate Speech Codecs☆92Sep 19, 2025Updated 6 months ago
- Source code and speech samples for the DSU-AVO paper accepted to INTERSPEECH 2023☆12May 13, 2024Updated last year
- [AAAI 2024] Code for CTX-vec2wav in UniCATS☆130Jun 11, 2024Updated last year
- ☆41May 15, 2023Updated 2 years ago
- The official repository of Dynamic-SUPERB.☆199Jun 24, 2025Updated 9 months ago