Transformer-based visually grounded speech models
☆19Sep 22, 2022Updated 3 years ago
Alternatives and similar repositories for FaST-VGS-Family
Users that are interested in FaST-VGS-Family are comparing it to the libraries listed below
Sorting:
- Word Discovery in Visually Grounded, Self-Supervised Speech Models☆26Dec 4, 2023Updated 2 years ago
- Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"☆28Feb 22, 2022Updated 4 years ago
- Unsupervised phone and word segmentation using dynamic programming on self-supervised VQ features.☆39Mar 4, 2024Updated 2 years ago
- (R&D) Text to speech using phonemes as inputs and audio codec codes as outputs. Loosely based on MegaByte, VALL-E and Encodec.☆48Sep 4, 2023Updated 2 years ago
- Lightweight speaker anonymization [IEEE SLT2021]☆27Jun 6, 2022Updated 3 years ago
- ☆15May 8, 2021Updated 4 years ago
- Non-Autoregressive Predictive Coding☆51Nov 3, 2020Updated 5 years ago
- multilingual speech aligner☆76Nov 19, 2023Updated 2 years ago
- Code for T5lephone: Bridging Speech and Text Self-supervised Models for Spoken Language Understanding via Phoneme level T5☆19Nov 29, 2022Updated 3 years ago
- ☆31Jul 13, 2023Updated 2 years ago
- Self-Supervised Speech Pre-training and Representation Learning Toolkit.☆10Feb 29, 2024Updated 2 years ago
- ☆19Apr 28, 2023Updated 2 years ago
- **ICASSP 2022** 《Toward Degradation-Robust Voice Conversion》Using speech enhancement and end-to-end denoising training to improve degrada…☆24Sep 27, 2022Updated 3 years ago
- Speaker embedding for VI-SVC and VI-SVS, alse for VITS; Use this to replace the ID to implement voice clone.☆30Sep 16, 2022Updated 3 years ago
- BERT and LSTM baseline models of the ZeroSpeech Challenge 2021☆60Oct 19, 2022Updated 3 years ago
- AD-TUNING: An Adaptive CHILD-TUNING Approach to Efficient Hyperparameter Optimization of Child Networks for Speech Processing Tasks in th…☆11Feb 23, 2024Updated 2 years ago
- An implementation of the Contrast Predictive Coding (CPC) method to train audio features in an unsupervised fashion.☆10Feb 22, 2022Updated 4 years ago
- Public Code for the paper MAE-AST: Masked Autoencoding Audio Spectrogram Transformer☆91Jun 9, 2022Updated 3 years ago
- DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning☆53Jan 18, 2024Updated 2 years ago
- Syllable Segmentation and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model☆34Aug 27, 2023Updated 2 years ago
- Speech2vec pre-trained word vectors☆76Sep 8, 2018Updated 7 years ago
- provide SPHERE-formatted output as well as RIFF, AU, AIFF and raw☆14Dec 18, 2021Updated 4 years ago
- S3PRL for Speech Emotion Recognition (see s3prl > downstream)☆15Updated this week
- Textless Speech-to-Music Retrieval Using Emotion Similarity [ICASSP23]☆17Aug 16, 2023Updated 2 years ago
- ☆26Sep 22, 2022Updated 3 years ago
- Code for the AVLnet (Interspeech 2021) and Cascaded Multilingual (Interspeech 2021) papers.☆54Mar 30, 2022Updated 3 years ago
- Unsupervised spoken sentence embeddings☆14Dec 14, 2022Updated 3 years ago
- ☆17Mar 24, 2022Updated 3 years ago
- ☆13Mar 7, 2022Updated 3 years ago
- A spoken version of the textual story cloze benchmark☆20Aug 6, 2023Updated 2 years ago
- Diffusion Model for Voice Conversion☆17Oct 11, 2022Updated 3 years ago
- Implementation of the Rhythm Formant Analysis methodology for identifying speech rhythms and rhythm variation in the low frequency spectr…☆17Apr 27, 2023Updated 2 years ago
- The source code for the paper CrossSinger (asru2023)☆18Oct 12, 2023Updated 2 years ago
- Code for the C2KD paper (ICASSP 2023)☆19May 15, 2023Updated 2 years ago
- A toolkit for Spoken Language Understanding Evaluation (SLUE) benchmark. Refer paper https://arxiv.org/abs/2111.10367 for more details. O…☆66Feb 26, 2024Updated 2 years ago
- Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation (ACM MM 2024)☆20Mar 17, 2025Updated 11 months ago
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.☆13Oct 11, 2022Updated 3 years ago
- RepVgg + HiFiGAN☆36Aug 10, 2022Updated 3 years ago
- One-shot TTS with Improved Unseen Speaker and Style Transfer☆37Mar 2, 2022Updated 4 years ago