wnhsu/ResDAVEnet-VQ

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/wnhsu/ResDAVEnet-VQ)

wnhsu / ResDAVEnet-VQ

Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"

☆28

Alternatives and similar repositories for ResDAVEnet-VQ

Users that are interested in ResDAVEnet-VQ are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

jasonppy / FaST-VGS-Family
View on GitHub
Transformer-based visually grounded speech models
☆19Sep 22, 2022Updated 3 years ago
roudimit / AVLnet
View on GitHub
Code for the AVLnet (Interspeech 2021) and Cascaded Multilingual (Interspeech 2021) papers.
☆54Mar 30, 2022Updated 4 years ago
dharwath / DAVEnet-pytorch
View on GitHub
Deep Audio-Visual Embedding network (DAVEnet) implementation in PyTorch
☆66Aug 31, 2018Updated 7 years ago
kamperh / vqwordseg
View on GitHub
Unsupervised phone and word segmentation using dynamic programming on self-supervised VQ features.
☆39May 5, 2026Updated 2 months ago
jasonppy / word-discovery
View on GitHub
Word Discovery in Visually Grounded, Self-Supervised Speech Models
☆27Dec 4, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
iamyuanchung / VQ-APC
View on GitHub
Vector Quantized Autoregressive Predictive Coding (VQ-APC)
☆38Nov 11, 2020Updated 5 years ago
lstrgar / ss-phoneme-seg
View on GitHub
Code for "Phoneme Segmentation Using Self-Supervised Speech Models", Strgar & Harwath, Proceedings of the IEEE Spoken Language Technology…
☆55Nov 4, 2022Updated 3 years ago
rhoposit / icassp2021
View on GitHub
☆15May 8, 2021Updated 5 years ago
zerospeech / zerospeech2021_baseline
View on GitHub
BERT and LSTM baseline models of the ZeroSpeech Challenge 2021
☆60Oct 19, 2022Updated 3 years ago
roudimit / c2kd
View on GitHub
Code for the C2KD paper (ICASSP 2023)
☆20May 15, 2023Updated 3 years ago
chorowski-lab / CPC_audio
View on GitHub
An implementation of the Contrast Predictive Coding (CPC) method to train audio features in an unsupervised fashion.
☆10Feb 22, 2022Updated 4 years ago
Exgc / OpenSR
View on GitHub
The official implementation of OpenSR (ACL2023 Oral)
☆17Nov 29, 2023Updated 2 years ago
Alexander-H-Liu / NPC
View on GitHub
Non-Autoregressive Predictive Coding
☆51Nov 3, 2020Updated 5 years ago
Hertin / WavPrompt
View on GitHub
☆37Jun 30, 2022Updated 4 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
ahaliassos / usr
View on GitHub
Official implementation of USR (NeurIPS 2024)
☆40Dec 21, 2024Updated last year
Exgc / AVMuST-TED
View on GitHub
☆24Mar 30, 2024Updated 2 years ago
W-Wu / DEER
View on GitHub
☆12Aug 25, 2023Updated 2 years ago
JeongHun0716 / e-mvsr
View on GitHub
Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation (ACM MM 2024)
☆20Mar 17, 2025Updated last year
zhaoyanpeng / xcfg
View on GitHub
X (weighted / probabilistic) Context-Free Grammars
☆25Jan 30, 2024Updated 2 years ago
JaesungHuh / av-diarization
View on GitHub
Audio-visual diarization pipeline used for creating VoxConverse dataset
☆22Jun 6, 2025Updated last year
tommccoy1 / rnn-hierarchical-biases
View on GitHub
Code for "Does syntax need to grow on trees? Sources of inductive bias in sequence to sequence networks"
☆24Jan 14, 2020Updated 6 years ago
rhoposit / multilingual_VQVAE
View on GitHub
☆37May 8, 2021Updated 5 years ago
ziaoang / AutoRec
View on GitHub
☆13Jun 17, 2016Updated 10 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
xinjli / alqalign
View on GitHub
multilingual speech aligner
☆78Nov 19, 2023Updated 2 years ago
Arnontu / DeepAudioWaveformPrior
View on GitHub
Official PyTorch implementation of the paper: "Deep Audio Waveform Prior" (Interspeech 2022) https://arxiv.org/abs/2207.10441
☆12Oct 25, 2022Updated 3 years ago
iamyuanchung / Autoregressive-Predictive-Coding
View on GitHub
Autoregressive Predictive Coding: An unsupervised autoregressive model for speech representation learning
☆191Jan 29, 2020Updated 6 years ago
ttaoREtw / semi-tts
View on GitHub
Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation
☆39Jul 16, 2020Updated 6 years ago
YuanGongND / uavm
View on GitHub
Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".
☆57Apr 20, 2023Updated 3 years ago
adiyoss / AutoVowelDuration
View on GitHub
Automatic Measurement of Vowel Duration for Consonant Vowel Consonant (CVC) sound files (JASA 2016)
☆14Feb 25, 2017Updated 9 years ago
rwth-i6 / rasr
View on GitHub
The RWTH ASR Toolkit.
☆59Updated this week
YasserdahouML / VSR_test_set
View on GitHub
WildVSR
☆22Dec 13, 2023Updated 2 years ago
Sindhu-Hegde / multivsr
View on GitHub
Official code for the paper "Scaling Multilingual Visual Speech Recognition"
☆20Aug 15, 2025Updated 11 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
oncescuandreea / QuerYD_downloader
View on GitHub
☆23Dec 5, 2023Updated 2 years ago
MiuLab / TaylorGAN
View on GitHub
☆31Apr 24, 2021Updated 5 years ago
NoviceStone / HyperMiner
View on GitHub
Improved Embedded Topic Models in Hyperbolic Space
☆16Mar 24, 2023Updated 3 years ago
hrbigelow / ae-wavenet
View on GitHub
Wavenet Autoencoder for Unsupervised speech representation learning (after Chorowski, Jan 2019)
☆176Sep 16, 2020Updated 5 years ago
LeeYongHyeok / DCM_vgg_transformer
View on GitHub
Dual cross modality attention audio-visual speech recognition model based on vgg transformer with hybrid CTC/attention architecture using…
☆14Jul 2, 2020Updated 6 years ago
ShesterG / SHuBERT
View on GitHub
☆16Sep 10, 2025Updated 10 months ago
ZackHodari / tts_data_tools
View on GitHub
Data processing tools for preparing speech and labels for training TTS voices
☆29Aug 13, 2020Updated 5 years ago