The VoxTube dataset official repository
☆71Feb 14, 2024Updated 2 years ago
Alternatives and similar repositories for VoxTube
Users that are interested in VoxTube are comparing it to the libraries listed below
Sorting:
- ☆10Sep 19, 2022Updated 3 years ago
- A repo containing download guidance and corresponding scripts of the VoxBlink dataset.☆28Apr 16, 2024Updated last year
- The official pytorch implemention of the Intespeech 2024 paper "Reshape Dimensions Network for Speaker Recognition"☆186Sep 24, 2025Updated 5 months ago
- Exploring Binary Classification Loss for Speaker Verification☆18Jul 18, 2023Updated 2 years ago
- A simple command line tool to calculate WER for ASR.☆14Oct 14, 2024Updated last year
- Models and codes for INTERSPEECH 2023 paper DistilXLSR: A Light Weight Cross-Lingual Speech Representation Model☆13Mar 30, 2025Updated 11 months ago
- Pronunciation-assisted Subword Modeling☆31May 30, 2019Updated 6 years ago
- A handy dataset of noises for ASR☆22May 29, 2019Updated 6 years ago
- Voice activity detection and speaker gender segmentation audiovisual corpus☆16Jan 20, 2025Updated last year
- [ICLR 2022] "Audio Lottery: Speech Recognition Made Ultra-Lightweight, Noise-Robust, and Transferable", by Shaojin Ding, Tianlong Chen, Z…☆32Apr 8, 2022Updated 3 years ago
- ☆15Jul 11, 2022Updated 3 years ago
- ☆17Apr 14, 2023Updated 2 years ago
- Official Repository For VoxBlink2☆85Aug 13, 2024Updated last year
- Visual Speech Recongnition☆19Dec 24, 2024Updated last year
- Spherical residual vector quantization (SRVQ)☆31Aug 25, 2024Updated last year
- Implementation of DCComix TTS: An End-to-End Expressive TTS with Discrete Code Collaborated with Mixer☆75Aug 21, 2023Updated 2 years ago
- [INTERSPEECH 2024] Official code for VoxSim: A perceptual voice similarity dataset☆12Sep 29, 2025Updated 5 months ago
- CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval☆13Jun 27, 2025Updated 8 months ago
- ☆13Nov 22, 2022Updated 3 years ago
- Repository for reproducing result in journal "Self-supervised learning for Speech Emotion Recognition"☆10Mar 15, 2023Updated 2 years ago
- ☆11Jun 14, 2024Updated last year
- 5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs☆57Nov 19, 2025Updated 3 months ago
- Speech Resynthesis and Language Modeling☆27Jun 11, 2025Updated 8 months ago
- Code repository for the paper "Improving End-to-End SLU performance with Prosodic Attention and Distillation" accepted at Interspeech 202…☆27May 17, 2023Updated 2 years ago
- Once more Diarization: Improving meeting transcription systems through segment-level speaker reassignment☆13Feb 5, 2025Updated last year
- Neural model for prediction of stress position in Russian words☆13Jun 22, 2025Updated 8 months ago
- ☆36Jan 6, 2026Updated 2 months ago
- ☆37Jun 30, 2022Updated 3 years ago
- DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning☆53Jan 18, 2024Updated 2 years ago
- Word Discovery in Visually Grounded, Self-Supervised Speech Models☆26Dec 4, 2023Updated 2 years ago
- Reimplementation of Miipher☆29Aug 16, 2023Updated 2 years ago
- 📖 LanMIT: A Toolkit for Improving Language Models in Low-resourced Speech Recognition based on Kaldi.☆22Jul 12, 2019Updated 6 years ago
- ☆62Nov 6, 2023Updated 2 years ago
- ☆13Oct 27, 2021Updated 4 years ago
- A JAX library for building lattice-based speech transducer models☆47Updated this week
- ☆39Oct 1, 2023Updated 2 years ago
- HiFTNet wav/audio super-resolution 16/24 kHz to 48 kHz☆24Jan 2, 2024Updated 2 years ago
- ☆63Jun 28, 2023Updated 2 years ago
- ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for…☆43Dec 17, 2020Updated 5 years ago