The VoxTube dataset official repository
☆71Feb 14, 2024Updated 2 years ago
Alternatives and similar repositories for VoxTube
Users that are interested in VoxTube are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The official pytorch implemention of the Intespeech 2024 paper "Reshape Dimensions Network for Speaker Recognition"☆191Sep 24, 2025Updated 6 months ago
- A repo containing download guidance and corresponding scripts of the VoxBlink dataset.☆29Apr 16, 2024Updated 2 years ago
- Exploring Binary Classification Loss for Speaker Verification☆18Jul 18, 2023Updated 2 years ago
- Models and codes for INTERSPEECH 2023 paper DistilXLSR: A Light Weight Cross-Lingual Speech Representation Model☆13Mar 30, 2025Updated last year
- ☆10Sep 19, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆15Jul 11, 2022Updated 3 years ago
- ☆11Jun 14, 2024Updated last year
- A JAX library for building lattice-based speech transducer models☆49Mar 2, 2026Updated last month
- A simple command line tool to calculate WER for ASR.☆14Oct 14, 2024Updated last year
- Visual Speech Recongnition☆20Dec 24, 2024Updated last year
- [ICLR 2022] "Audio Lottery: Speech Recognition Made Ultra-Lightweight, Noise-Robust, and Transferable", by Shaojin Ding, Tianlong Chen, Z…☆32Apr 8, 2022Updated 4 years ago
- Pronunciation-assisted Subword Modeling☆31May 30, 2019Updated 6 years ago
- A handy dataset of noises for ASR☆22May 29, 2019Updated 6 years ago
- Official Repository For VoxBlink2☆86Aug 13, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval☆13Jun 27, 2025Updated 9 months ago
- Once more Diarization: Improving meeting transcription systems through segment-level speaker reassignment☆14Feb 5, 2025Updated last year
- Speaker change detection using SincNet and an LSTM/Transformer☆58May 26, 2025Updated 10 months ago
- Implementation of DCComix TTS: An End-to-End Expressive TTS with Discrete Code Collaborated with Mixer☆74Aug 21, 2023Updated 2 years ago
- ☆17Apr 14, 2023Updated 3 years ago
- Code repository for the paper "Improving End-to-End SLU performance with Prosodic Attention and Distillation" accepted at Interspeech 202…☆27May 17, 2023Updated 2 years ago
- Automatically setup the AISHELL-4 and MSDWild dataset for usage with pyannote-database (and pyannote-audio)☆15Oct 22, 2025Updated 5 months ago
- Word Discovery in Visually Grounded, Self-Supervised Speech Models☆27Dec 4, 2023Updated 2 years ago
- Spherical residual vector quantization (SRVQ)☆31Aug 25, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning☆54Jan 18, 2024Updated 2 years ago
- 📖 LanMIT: A Toolkit for Improving Language Models in Low-resourced Speech Recognition based on Kaldi.☆22Jul 12, 2019Updated 6 years ago
- Voice activity detection and speaker gender segmentation audiovisual corpus☆16Jan 20, 2025Updated last year
- This repository contains official pytorch implementation and pre-trained models for the MR-RawNet.☆17Jun 12, 2024Updated last year
- 5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs☆57Nov 19, 2025Updated 4 months ago
- ☆37Jun 30, 2022Updated 3 years ago
- ☆14Nov 22, 2022Updated 3 years ago
- Collection of scripts from mHuBERT-147.☆34Nov 19, 2024Updated last year
- [INTERSPEECH 2024] Official code for VoxSim: A perceptual voice similarity dataset☆13Sep 29, 2025Updated 6 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Self-Supervised Speech/Sound Pre-training and Representation Learning Toolkit☆13Nov 18, 2022Updated 3 years ago
- ☆64Nov 6, 2023Updated 2 years ago
- ☆36Jan 6, 2026Updated 3 months ago
- Neural model for prediction of stress position in Russian words☆13Jun 22, 2025Updated 9 months ago
- Implementation of the paper "Self-supervised Learning with Random-projection Quantizer for Speech Recognition" in Pytorch.☆93May 25, 2023Updated 2 years ago
- ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for…☆44Dec 17, 2020Updated 5 years ago
- ☆64Jun 28, 2023Updated 2 years ago