Unsupervised phone and word segmentation using dynamic programming on self-supervised VQ features.
☆39Mar 4, 2024Updated 2 years ago
Alternatives and similar repositories for vqwordseg
Users that are interested in vqwordseg are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Transformer-based visually grounded speech models☆19Sep 22, 2022Updated 3 years ago
- Word Discovery in Visually Grounded, Self-Supervised Speech Models☆27Dec 4, 2023Updated 2 years ago
- Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"☆28Feb 22, 2022Updated 4 years ago
- Fast and differentiable hidden Markov model in C++☆19Jan 20, 2023Updated 3 years ago
- ESLTTS dataset☆16Feb 6, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆31Jul 13, 2023Updated 2 years ago
- Code for "Phoneme Segmentation Using Self-Supervised Speech Models", Strgar & Harwath, Proceedings of the IEEE Spoken Language Technology…☆55Nov 4, 2022Updated 3 years ago
- [INTERSPEECH 2023 Best Paper Shortlist] Official implementation for MT4SSL: Boosting Self-Supervised Speech Representation Learning by In…☆45Mar 25, 2024Updated 2 years ago
- Implementation of multi-level Contrastive Predictive Coding (CPC) methods☆20Jan 12, 2023Updated 3 years ago
- Official Code for SyllableLM: Learning Coarse Semantic Units for Speech Language Models☆62Jul 1, 2025Updated 9 months ago
- Syllable Segmentation and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model☆35Aug 27, 2023Updated 2 years ago
- ☆12Aug 25, 2023Updated 2 years ago
- Public Code for the paper MAE-AST: Masked Autoencoding Audio Spectrogram Transformer☆93Jun 9, 2022Updated 3 years ago
- Understanding and Tackling Hallucinations in Large Audio-Language Models | ICASSP 2025, Interspeech 2024☆34Mar 14, 2025Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆11Nov 5, 2021Updated 4 years ago
- multilingual speech aligner☆76Nov 19, 2023Updated 2 years ago
- Clean and modernized implementation of FastSpeech2/LightSpeech using IPA☆18Aug 16, 2024Updated last year
- Data manipulation and transformation for audio signal processing, powered by PyTorch☆11Sep 30, 2024Updated last year
- PolEval 2021 Task 1☆15Jun 28, 2022Updated 3 years ago
- Vocoder-Free Non-Parallel Conversion of Whispered Speech With Masked Cycle-Consistent Generative Adversarial Networks☆17Aug 18, 2023Updated 2 years ago
- X (weighted / probabilistic) Context-Free Grammars☆25Jan 30, 2024Updated 2 years ago
- Vector-Quantized Contrastive Predictive Coding for Acoustic Unit Discovery and Voice Conversion☆143Sep 1, 2020Updated 5 years ago
- Evaluation tool used in the BigVSAN paper☆14Mar 22, 2024Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Layer-wise analysis of self-supervised pre-trained speech representations☆131Oct 18, 2024Updated last year
- Automatic speech annotator processing speech with voice activaty detection, overlapping speech detection, speaker diarization and automat…☆33Jun 14, 2024Updated last year
- Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)☆151Sep 14, 2023Updated 2 years ago
- Lightweight Speech Representation Learning for One-Shot Voice Conversion☆24Dec 12, 2024Updated last year
- Project repository for the work done in Triplet Entropy Loss: Improving The Generalization of Short Speech Language Identification Syst…☆13Feb 17, 2021Updated 5 years ago
- DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning☆54Jan 18, 2024Updated 2 years ago
- Code associated with the paper: CTC-DRO: Robust Optimization for Reducing Language Disparities in Speech Recognition.☆17May 16, 2025Updated 11 months ago
- DUSTED: Spoken-Term Discovery using Discrete Speech Units☆18Oct 2, 2024Updated last year
- Code for the C2KD paper (ICASSP 2023)☆19May 15, 2023Updated 2 years ago
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- ☆21Feb 27, 2024Updated 2 years ago
- Transcribing Speech with Multinomial Diffusion, training code and models.☆80Sep 27, 2023Updated 2 years ago
- Deep Speech Distances PyTorch☆29Feb 21, 2022Updated 4 years ago
- **Interspeech 2022** 《SpeechPrompt: An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks》Speec…☆102Apr 10, 2025Updated last year
- C++ version of pyannote audio overlapped speech detection pipeline☆13Feb 14, 2024Updated 2 years ago
- Source code and speech samples for the DSU-AVO paper accepted to INTERSPEECH 2023☆12May 13, 2024Updated last year
- LoRA-based phoneme/prosody control for LLM-based TTS with no G2P - Lightweight adapter for edit and control the target language's phoneme…☆24Aug 14, 2025Updated 8 months ago