Transformer-based visually grounded speech models
☆19Sep 22, 2022Updated 3 years ago
Alternatives and similar repositories for FaST-VGS-Family
Users that are interested in FaST-VGS-Family are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Word Discovery in Visually Grounded, Self-Supervised Speech Models☆26Dec 4, 2023Updated 2 years ago
- Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"☆28Feb 22, 2022Updated 4 years ago
- Unsupervised phone and word segmentation using dynamic programming on self-supervised VQ features.☆39Mar 4, 2024Updated 2 years ago
- Syllable Segmentation and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model☆34Aug 27, 2023Updated 2 years ago
- Non-Autoregressive Predictive Coding☆51Nov 3, 2020Updated 5 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- (R&D) Text to speech using phonemes as inputs and audio codec codes as outputs. Loosely based on MegaByte, VALL-E and Encodec.☆48Sep 4, 2023Updated 2 years ago
- Code for T5lephone: Bridging Speech and Text Self-supervised Models for Spoken Language Understanding via Phoneme level T5☆19Nov 29, 2022Updated 3 years ago
- ☆15May 8, 2021Updated 4 years ago
- Code for the C2KD paper (ICASSP 2023)☆19May 15, 2023Updated 2 years ago
- Public Code for the paper MAE-AST: Masked Autoencoding Audio Spectrogram Transformer☆93Jun 9, 2022Updated 3 years ago
- multilingual speech aligner☆76Nov 19, 2023Updated 2 years ago
- BERT and LSTM baseline models of the ZeroSpeech Challenge 2021☆60Oct 19, 2022Updated 3 years ago
- Self-Supervised Speech Pre-training and Representation Learning Toolkit.☆10Feb 29, 2024Updated 2 years ago
- An implementation of the Contrast Predictive Coding (CPC) method to train audio features in an unsupervised fashion.☆10Feb 22, 2022Updated 4 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Lightweight speaker anonymization [IEEE SLT2021]☆27Jun 6, 2022Updated 3 years ago
- A spoken version of the textual story cloze benchmark☆21Aug 6, 2023Updated 2 years ago
- ☆31Jul 13, 2023Updated 2 years ago
- DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning☆54Jan 18, 2024Updated 2 years ago
- Unsupervised spoken sentence embeddings☆14Dec 14, 2022Updated 3 years ago
- S3PRL for Speech Emotion Recognition (see s3prl > downstream)☆15Feb 28, 2026Updated 3 weeks ago
- AD-TUNING: An Adaptive CHILD-TUNING Approach to Efficient Hyperparameter Optimization of Child Networks for Speech Processing Tasks in th…☆11Feb 23, 2024Updated 2 years ago
- Speaker embedding for VI-SVC and VI-SVS, alse for VITS; Use this to replace the ID to implement voice clone.☆30Sep 16, 2022Updated 3 years ago
- Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation☆150Jan 16, 2024Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Textless Speech-to-Music Retrieval Using Emotion Similarity [ICASSP23]☆17Aug 16, 2023Updated 2 years ago
- Speech2vec pre-trained word vectors☆76Sep 8, 2018Updated 7 years ago
- Official Code for SyllableLM: Learning Coarse Semantic Units for Speech Language Models☆61Jul 1, 2025Updated 8 months ago
- SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model, Accepted to IEEE SLT 2022☆119Nov 25, 2022Updated 3 years ago
- Code for the AVLnet (Interspeech 2021) and Cascaded Multilingual (Interspeech 2021) papers.☆54Mar 30, 2022Updated 3 years ago
- **ICASSP 2022** 《Toward Degradation-Robust Voice Conversion》Using speech enhancement and end-to-end denoising training to improve degrada…☆24Sep 27, 2022Updated 3 years ago
- Layer-wise analysis of self-supervised pre-trained speech representations☆128Oct 18, 2024Updated last year
- ☆19Apr 28, 2023Updated 2 years ago
- Code for "Phoneme Segmentation Using Self-Supervised Speech Models", Strgar & Harwath, Proceedings of the IEEE Spoken Language Technology…☆55Nov 4, 2022Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- provide SPHERE-formatted output as well as RIFF, AU, AIFF and raw☆14Dec 18, 2021Updated 4 years ago
- ☆42Mar 25, 2022Updated 4 years ago
- A toolkit for Spoken Language Understanding Evaluation (SLUE) benchmark. Refer paper https://arxiv.org/abs/2111.10367 for more details. O…☆66Feb 26, 2024Updated 2 years ago
- One-shot TTS with Improved Unseen Speaker and Style Transfer☆37Mar 2, 2022Updated 4 years ago
- ☆26Sep 22, 2022Updated 3 years ago
- ASR text preprocessing utility☆21Aug 5, 2024Updated last year
- ☆17Mar 24, 2022Updated 4 years ago