IDRnD/VoxTube

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/IDRnD/VoxTube)

IDRnD / VoxTube

The VoxTube dataset official repository

☆71

Alternatives and similar repositories for VoxTube

Users that are interested in VoxTube are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

IDRnD / redimnet
View on GitHub
The official pytorch implemention of the Intespeech 2024 paper "Reshape Dimensions Network for Speaker Recognition"
☆205Jul 9, 2026Updated 2 weeks ago
VoxBlink / ScriptsForVoxBlink
View on GitHub
A repo containing download guidance and corresponding scripts of the VoxBlink dataset.
☆30Apr 16, 2024Updated 2 years ago
Hunterhuan / sphereface2_speaker_verification
View on GitHub
Exploring Binary Classification Loss for Speaker Verification
☆18Jul 18, 2023Updated 3 years ago
backspacetg / distilXLSR
View on GitHub
Models and codes for INTERSPEECH 2023 paper DistilXLSR: A Light Weight Cross-Lingual Speech Representation Model
☆13Mar 30, 2025Updated last year
Chung-I / youtube-asr-crawler
View on GitHub
☆10Sep 19, 2022Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
yucongzh / online_speaker_diarization
View on GitHub
☆15Jul 11, 2022Updated 4 years ago
SSTC-Challenge / SSTC2024_baseline_system
View on GitHub
☆12Jun 14, 2024Updated 2 years ago
google-research / last
View on GitHub
A JAX library for building lattice-based speech transducer models
☆48Jul 2, 2026Updated 3 weeks ago
pkufool / simple-wer
View on GitHub
A simple command line tool to calculate WER for ASR.
☆14Oct 14, 2024Updated last year
VITA-Group / Audio-Lottery
View on GitHub
[ICLR 2022] "Audio Lottery: Speech Recognition Made Ultra-Lightweight, Noise-Robust, and Transferable", by Shaojin Ding, Tianlong Chen, Z…
☆32Apr 8, 2022Updated 4 years ago
liu12366262626 / AlignVSR
View on GitHub
Visual Speech Recongnition
☆21Dec 24, 2024Updated last year
hainan-xv / PASM
View on GitHub
Pronunciation-assisted Subword Modeling
☆31May 30, 2019Updated 7 years ago
speechio / asr-noises
View on GitHub
A handy dataset of noises for ASR
☆22May 29, 2019Updated 7 years ago
VoxBlink2 / ScriptsForVoxBlink2
View on GitHub
Official Repository For VoxBlink2
☆88Aug 13, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
HHousen / speaker-change-detection
View on GitHub
Speaker change detection using SincNet and an LSTM/Transformer
☆57May 26, 2025Updated last year
yfyeung / DS-WED
View on GitHub
[ICASSP 2026] Official code for "Measuring Prosody Diversity in Zero-Shot TTS: A New Metric, Benchmark, and Exploration"
☆17Apr 16, 2026Updated 3 months ago
llm-lab-org / CLASP
View on GitHub
CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval
☆13Jun 27, 2025Updated last year
fgnt / speaker_reassignment
View on GitHub
Once more Diarization: Improving meeting transcription systems through segment-level speaker reassignment
☆14Feb 5, 2025Updated last year
alumae / streaming-punctuator
View on GitHub
☆17Apr 14, 2023Updated 3 years ago
lakahaga / dc-comix-tts
View on GitHub
Implementation of DCComix TTS: An End-to-End Expressive TTS with Discrete Code Collaborated with Mixer
☆74Aug 21, 2023Updated 2 years ago
jasonppy / word-discovery
View on GitHub
Word Discovery in Visually Grounded, Self-Supervised Speech Models
☆27Dec 4, 2023Updated 2 years ago
Alexander-H-Liu / dinosr
View on GitHub
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
☆53Jan 18, 2024Updated 2 years ago
skit-ai / slu-prosody
View on GitHub
Code repository for the paper "Improving End-to-End SLU performance with Prosodic Attention and Distillation" accepted at Interspeech 202…
☆27May 17, 2023Updated 3 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
yluo42 / SRVQ
View on GitHub
Spherical residual vector quantization (SRVQ)
☆31Aug 25, 2024Updated last year
charlesliucn / LanMIT
View on GitHub
📖 LanMIT: A Toolkit for Improving Language Models in Low-resourced Speech Recognition based on Kaldi.
☆22Jul 12, 2019Updated 7 years ago
kimho1wq / MR-RawNet
View on GitHub
This repository contains official pytorch implementation and pre-trained models for the MR-RawNet.
☆17Jun 12, 2024Updated 2 years ago
ina-foss / InaGVAD
View on GitHub
Voice activity detection and speaker gender segmentation audiovisual corpus
☆16Jan 20, 2025Updated last year
PalabraAI / redimnet2
View on GitHub
This repository contains the official implementation and pretrained weights for the paper "ReDimNet2: Scaling Speaker Verification via Ti…
☆65Jul 9, 2026Updated 2 weeks ago
Hertin / WavPrompt
View on GitHub
☆37Jun 30, 2022Updated 4 years ago
haoheliu / ontology-aware-audio-tagging
View on GitHub
☆14Nov 22, 2022Updated 3 years ago
egruttadauria98 / SSpaVAlDo
View on GitHub
☆37Jan 6, 2026Updated 6 months ago
SonyCSLParis / ssl-singer-identity
View on GitHub
☆69Nov 6, 2023Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
HarunoriKawano / BEST-RQ
View on GitHub
Implementation of the paper "Self-supervised Learning with Random-projection Quantizer for Speech Recognition" in Pytorch.
☆96May 25, 2023Updated 3 years ago
Koziev / StressModel
View on GitHub
Neural model for prediction of stress position in Russian words
☆13Jun 22, 2025Updated last year
jiaqili3 / DualCodec
View on GitHub
[Interspeech 2025] DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec
☆72Mar 11, 2026Updated 4 months ago
skakouros / s3prl_attentive_correlation
View on GitHub
Self-Supervised Speech/Sound Pre-training and Representation Learning Toolkit
☆13Nov 18, 2022Updated 3 years ago
nikvaessen / disjoint-mtl
View on GitHub
Research code for "Towards multi-task learning of speech and speaker recognition" at https://arxiv.org/pdf/2302.12773.pdf
☆12Dec 2, 2024Updated last year
ajinkyakulkarni14 / ERISHA
View on GitHub
ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for…
☆44Dec 17, 2020Updated 5 years ago
lucasnewman / vocos-mlx
View on GitHub
Implementation of 'Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis', in MLX
☆24Oct 30, 2024Updated last year