The VoxTube dataset official repository
☆71Feb 14, 2024Updated 2 years ago
Alternatives and similar repositories for VoxTube
Users that are interested in VoxTube are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The official pytorch implemention of the Intespeech 2024 paper "Reshape Dimensions Network for Speaker Recognition"☆186Sep 24, 2025Updated 6 months ago
- A repo containing download guidance and corresponding scripts of the VoxBlink dataset.☆28Apr 16, 2024Updated last year
- Exploring Binary Classification Loss for Speaker Verification☆18Jul 18, 2023Updated 2 years ago
- Models and codes for INTERSPEECH 2023 paper DistilXLSR: A Light Weight Cross-Lingual Speech Representation Model☆13Mar 30, 2025Updated 11 months ago
- ☆10Sep 19, 2022Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- ☆15Jul 11, 2022Updated 3 years ago
- ☆11Jun 14, 2024Updated last year
- A JAX library for building lattice-based speech transducer models☆47Mar 2, 2026Updated 3 weeks ago
- A simple command line tool to calculate WER for ASR.☆14Oct 14, 2024Updated last year
- Visual Speech Recongnition☆20Dec 24, 2024Updated last year
- [ICLR 2022] "Audio Lottery: Speech Recognition Made Ultra-Lightweight, Noise-Robust, and Transferable", by Shaojin Ding, Tianlong Chen, Z…☆32Apr 8, 2022Updated 3 years ago
- Pronunciation-assisted Subword Modeling☆31May 30, 2019Updated 6 years ago
- A handy dataset of noises for ASR☆22May 29, 2019Updated 6 years ago
- Official Repository For VoxBlink2☆86Aug 13, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval☆13Jun 27, 2025Updated 8 months ago
- Once more Diarization: Improving meeting transcription systems through segment-level speaker reassignment☆13Feb 5, 2025Updated last year
- IDVoice + IDLive Android demo app☆16Jan 23, 2025Updated last year
- Speaker change detection using SincNet and an LSTM/Transformer☆57May 26, 2025Updated 10 months ago
- Implementation of DCComix TTS: An End-to-End Expressive TTS with Discrete Code Collaborated with Mixer☆75Aug 21, 2023Updated 2 years ago
- ☆17Apr 14, 2023Updated 2 years ago
- Code repository for the paper "Improving End-to-End SLU performance with Prosodic Attention and Distillation" accepted at Interspeech 202…☆27May 17, 2023Updated 2 years ago
- Automatically setup the AISHELL-4 and MSDWild dataset for usage with pyannote-database (and pyannote-audio)☆15Oct 22, 2025Updated 5 months ago
- Word Discovery in Visually Grounded, Self-Supervised Speech Models☆26Dec 4, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Spherical residual vector quantization (SRVQ)☆31Aug 25, 2024Updated last year
- 📖 LanMIT: A Toolkit for Improving Language Models in Low-resourced Speech Recognition based on Kaldi.☆22Jul 12, 2019Updated 6 years ago
- DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning☆54Jan 18, 2024Updated 2 years ago
- Voice activity detection and speaker gender segmentation audiovisual corpus☆16Jan 20, 2025Updated last year
- This repository contains official pytorch implementation and pre-trained models for the MR-RawNet.☆17Jun 12, 2024Updated last year
- 5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs☆57Nov 19, 2025Updated 4 months ago
- ☆37Jun 30, 2022Updated 3 years ago
- ☆14Nov 22, 2022Updated 3 years ago
- Collection of scripts from mHuBERT-147.☆32Nov 19, 2024Updated last year
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Self-Supervised Speech/Sound Pre-training and Representation Learning Toolkit☆13Nov 18, 2022Updated 3 years ago
- [INTERSPEECH 2024] Official code for VoxSim: A perceptual voice similarity dataset☆12Sep 29, 2025Updated 5 months ago
- ☆64Nov 6, 2023Updated 2 years ago
- ☆36Jan 6, 2026Updated 2 months ago
- IDVoice + IDLive iOS demo app☆11Mar 15, 2024Updated 2 years ago
- Neural model for prediction of stress position in Russian words☆13Jun 22, 2025Updated 9 months ago
- Implementation of the paper "Self-supervised Learning with Random-projection Quantizer for Speech Recognition" in Pytorch.☆91May 25, 2023Updated 2 years ago