roudimit/AVLnet

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/roudimit/AVLnet)

roudimit / AVLnet

Code for the AVLnet (Interspeech 2021) and Cascaded Multilingual (Interspeech 2021) papers.

☆54

Alternatives and similar repositories for AVLnet

Users that are interested in AVLnet are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

wnhsu / ResDAVEnet-VQ
View on GitHub
Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"
☆28Feb 22, 2022Updated 4 years ago
rxtan2 / video-grounding-narrations
View on GitHub
☆12Mar 12, 2023Updated 3 years ago
jasonppy / FaST-VGS-Family
View on GitHub
Transformer-based visually grounded speech models
☆19Sep 22, 2022Updated 3 years ago
epic-kitchens / C5-Multi-Instance-Retrieval
View on GitHub
☆11Feb 9, 2026Updated 4 months ago
ninatu / everything_at_once
View on GitHub
Official implementation of "Everything at Once - Multi-modal Fusion Transformer for Video Retrieval." CVPR 2022
☆115Jul 4, 2022Updated 4 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
uark-cviu / Right2Talk
View on GitHub
[ICCV'21] The Right to Talk: An Audio-Visual Transformer Approach
☆20Aug 2, 2021Updated 4 years ago
e-bug / fine-grained-evals
View on GitHub
[ACL 2023] Code and data for our paper "Measuring Progress in Fine-grained Vision-and-Language Understanding"
☆13Jun 11, 2023Updated 3 years ago
edsonroteia / cav-mae-sync
View on GitHub
[CVPR25] Official Implementation of CAV-MAE Sync
☆31Apr 5, 2026Updated 3 months ago
xinjli / alqalign
View on GitHub
multilingual speech aligner
☆78Nov 19, 2023Updated 2 years ago
JeongHun0716 / e-mvsr
View on GitHub
Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation (ACM MM 2024)
☆20Mar 17, 2025Updated last year
stoneMo / MGN
View on GitHub
Official implementation for MGN
☆20Dec 22, 2022Updated 3 years ago
f90 / Mix-Wave-U-Net
View on GitHub
Wave-U-Net for automatic (drum) mixing
☆38Mar 24, 2023Updated 3 years ago
antoine77340 / MIL-NCE_HowTo100M
View on GitHub
PyTorch GPU distributed training code for MIL-NCE HowTo100M
☆220Jul 5, 2022Updated 4 years ago
fearofchou / mmnet
View on GitHub
☆16Apr 10, 2019Updated 7 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
SpeechColab / PySpeechColab
View on GitHub
A library of speech gadgets.
☆15Oct 15, 2022Updated 3 years ago
antoine77340 / S3D_HowTo100M
View on GitHub
S3D Text-Video model trained on HowTo100M using MIL-NCE
☆200Jul 3, 2020Updated 6 years ago
roudimit / c2kd
View on GitHub
Code for the C2KD paper (ICASSP 2023)
☆19May 15, 2023Updated 3 years ago
showlab / DemoVLP
View on GitHub
[Arxiv2022] Revitalize Region Feature for Democratizing Video-Language Pre-training
☆22Mar 19, 2022Updated 4 years ago
TengdaHan / TemporalAlignNet
View on GitHub
[CVPR'22 Oral] Temporal Alignment Networks for Long-term Video. Tengda Han, Weidi Xie, Andrew Zisserman.
☆122Oct 9, 2023Updated 2 years ago
light1726 / BetaVAE_VC
View on GitHub
Implementation for paper "Disentangled Speech Representation Learning for One-Shot Cross-Lingual Voice Conversion Using ß-VAE"
☆43Apr 10, 2023Updated 3 years ago
brian7685 / Multimodal-Clustering-Network
View on GitHub
ICCV 2021
☆34May 11, 2022Updated 4 years ago
danpovey / kaldi_lm
View on GitHub
Old language modeling tool that's used in kaldi
☆17Apr 20, 2023Updated 3 years ago
chorowski-lab / CPC_audio
View on GitHub
An implementation of the Contrast Predictive Coding (CPC) method to train audio features in an unsupervised fashion.
☆10Feb 22, 2022Updated 4 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
WangHelin1997 / DuTa-VC
View on GitHub
Source code and demo for INTERPSEECH 2023 paper: DuTa-VC: A Duration-aware Typical-to-atypical Voice Conversion Approach with Diffusion P…
☆38Dec 5, 2023Updated 2 years ago
revsic / torch-retriever-vc
View on GitHub
PyTorch implementation of Retriever: Learning Content-Style Representation
☆12Jan 27, 2023Updated 3 years ago
MohammedAlghamdi / talking-heads-acm-mm
View on GitHub
Talking Head from Speech Audio using a Pre-trained Image Generator
☆22May 7, 2024Updated 2 years ago
german-asr / kaldi-german
View on GitHub
Scripts for training Kaldi for German speech recognition (ASR).
☆27Feb 11, 2021Updated 5 years ago
microsoft / conservative-uncertainty-estimation-random-priors
View on GitHub
Source code for paper Conservative Uncertainty Estimation By Fitting Prior Networks (ICLR 2020)
☆22Nov 28, 2022Updated 3 years ago
WangHelin1997 / GL-AT
View on GitHub
Pytorch implementation of the paper : A Global-local Attention Framework for Weakly Labelled Audio Tagging.
☆13Feb 6, 2021Updated 5 years ago
colaudiolab / DeepLearning4UTI
View on GitHub
Deep Learning For Ultrasound Tongue Imaging
☆13Dec 17, 2024Updated last year
utter-project / mHuBERT-147-scripts
View on GitHub
Collection of scripts from mHuBERT-147.
☆35Nov 19, 2024Updated last year
roger-tseng / av-superb
View on GitHub
A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)
☆58Apr 17, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
showlab / all-in-one
View on GitHub
[CVPR2023] All in One: Exploring Unified Video-Language Pre-training
☆281Mar 25, 2023Updated 3 years ago
JeongHun0716 / vsr-low
View on GitHub
Visual Speech Recognition For Low-Resource Languages with Automatic Labels (ICASSP 2024)
☆17Mar 17, 2025Updated last year
huaxiuyao / KGML
View on GitHub
KGML for EMNLP 2021
☆10Feb 2, 2022Updated 4 years ago
coryshain / dnnseg
View on GitHub
☆11Mar 20, 2021Updated 5 years ago
Hertin / WavPrompt
View on GitHub
☆37Jun 30, 2022Updated 4 years ago
YasserdahouML / VSR_test_set
View on GitHub
WildVSR
☆22Dec 13, 2023Updated 2 years ago
rhoposit / icassp2021
View on GitHub
☆15May 8, 2021Updated 5 years ago