Tiago-Roxo/WASD

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Tiago-Roxo/WASD)

Tiago-Roxo / WASD

☆20

Alternatives and similar repositories for WASD

Users that are interested in WASD are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

okankop / ASDNet
View on GitHub
Audio-Visual Active Speaker Detection with PyTorch on AVA-ActiveSpeaker dataset
☆73Jan 18, 2022Updated 4 years ago
Junhua-Liao / Light-ASD
View on GitHub
The repository for IEEE CVPR 2023 (A Light Weight Model for Active Speaker Detection)
☆181Mar 23, 2025Updated last year
Jiang-Yidi / TS-TalkNet
View on GitHub
INTERSPEECH2023: Target Active Speaker Detection with Audio-visual Cues
☆61May 29, 2023Updated 3 years ago
clovaai / lookwhostalking
View on GitHub
Look Who’s Talking: Active Speaker Detection in the Wild
☆76Aug 24, 2023Updated 2 years ago
aispeech-lab / advr-avss
View on GitHub
Pytorch implementation of our paper: Audio-Visual Speech Separation with Visual Features Enhanced by Adversarial Training.
☆18Jul 11, 2022Updated 4 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
showlab / AVA-AVD
View on GitHub
☆22Nov 24, 2022Updated 3 years ago
fuankarion / active-speakers-context
View on GitHub
Code for the Active Speakers in Context Paper (CVPR2020)
☆58May 19, 2021Updated 5 years ago
zaocan666 / DyViSE
View on GitHub
Dynamic vision-guided speaker embedding for audio-visual speaker diarization
☆12Jul 5, 2022Updated 4 years ago
X-LANCE / MSDWILD
View on GitHub
[INTERSPEECH 2022] This dataset is designed for multi-modal speaker diarization and lip-speech synchronization in the wild.
☆65Jan 24, 2024Updated 2 years ago
stoneMo / OneAVM
View on GitHub
Official Codebase of "A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition" (ICML 2023)
☆12Jun 1, 2023Updated 3 years ago
choijeongsoo / av2av
View on GitHub
[CVPR 2024] AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
☆48Sep 6, 2024Updated last year
sangmin-git / MMSI
View on GitHub
Code for "Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations" (CVPR 2024 Oral)
☆19Jun 23, 2024Updated 2 years ago
zcxu-eric / AVA-AVD
View on GitHub
☆51Nov 24, 2022Updated 3 years ago
sungnyun / avsr-temporal-dynamics
View on GitHub
(SLT 2024) Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition
☆13Oct 22, 2024Updated last year
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
cvdfoundation / ava-dataset
View on GitHub
The AVA dataset densely annotates 80 atomic visual actions in 351k movie clips with actions localized in space and time, resulting in 1.6…
☆347Feb 9, 2022Updated 4 years ago
YasserdahouML / VSR_test_set
View on GitHub
WildVSR
☆22Dec 13, 2023Updated 2 years ago
ojoaobrito / ExplainablePR
View on GitHub
Code repository for the paper "A Deep Adversarial Framework for Visually Explainable Periocular Recognition" - CVPR 2021 Biometrics Works…
☆16Feb 7, 2025Updated last year
DaoD / KPN
View on GitHub
SIGIR 2021: Proactive Retrieval-based Chatbots based on Relevant Knowledge and Goals
☆11Jul 30, 2021Updated 4 years ago
plnguyen2908 / UniTalk-ASD-code
View on GitHub
[Interspeech 2026] Revisiting Active Speaker Detection: An In-the-Wild Benchmark for Generalization and Robustness
☆21Jun 25, 2026Updated 3 weeks ago
kaistmm / TalkNCE
View on GitHub
Official implementation of TalkNCE (ICASSP 2024).
☆18Apr 30, 2025Updated last year
walkoncross / voxceleb2-download-zyf
View on GitHub
Tools for downloading VoxCeleb2 dataset
☆35Mar 16, 2024Updated 2 years ago
Sindhu-Hegde / multivsr
View on GitHub
Official code for the paper "Scaling Multilingual Visual Speech Recognition"
☆20Aug 15, 2025Updated 11 months ago
dengyang17 / LLM-Proactive
View on GitHub
☆15Nov 23, 2023Updated 2 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
deeplsd / Merkel-Podcast-Corpus
View on GitHub
This dataset is presented in the paper Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video…
☆12Sep 21, 2022Updated 3 years ago
IntelLabs / GraVi-T
View on GitHub
Graph learning framework for long-term video understanding
☆72Jul 13, 2026Updated last week
jyjunmcl / Depth-Map-Decomposition
View on GitHub
☆10Sep 11, 2022Updated 3 years ago
liyunlongaaa / AD-TUNING
View on GitHub
AD-TUNING: An Adaptive CHILD-TUNING Approach to Efficient Hyperparameter Optimization of Child Networks for Speech Processing Tasks in th…
☆11Feb 23, 2024Updated 2 years ago
uark-cviu / Right2Talk
View on GitHub
[ICCV'21] The Right to Talk: An Audio-Visual Transformer Approach
☆20Aug 2, 2021Updated 4 years ago
stoneMo / EZ-VSL
View on GitHub
Official Codebase of "Localizing Visual Sounds the Easy Way" (ECCV 2022)
☆41Oct 2, 2022Updated 3 years ago
jlazarow / learning_instance_occlusion
View on GitHub
Code for the CVPR 2020 paper "Learning Instance Occlusion for Panoptic Segmentation"
☆13Jun 17, 2020Updated 6 years ago
EricLee8 / MPD_EMVI
View on GitHub
Official implementation of our paper at ACL 2023: Pre-training Multi-party Dialogue Models with Latent Discourse Inference
☆10Jul 10, 2023Updated 3 years ago
wencheng256 / BRNet
View on GitHub
☆13Nov 15, 2022Updated 3 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
xiaoxiaomiao323 / MSA
View on GitHub
☆16Feb 19, 2026Updated 5 months ago
pengzhendong / speaker-diarization
View on GitHub
Offline Speaker Diarization with SenseVoice by Sherpa ONNX.
☆15Dec 23, 2024Updated last year
jksingh07 / Garbage-Detection
View on GitHub
This project was built during the competition of Smart India Hackathon 2020. In This I am using a Android device's Camera to detect Garba…
☆12Apr 5, 2023Updated 3 years ago
Junhua-Liao / LR-ASD
View on GitHub
The repository for Springer IJCV 2025 (LR-ASD: Lightweight and Robust Network for Active Speaker Detection)
☆131Mar 23, 2025Updated last year
nguyenvulebinh / AVSRCocktail
View on GitHub
Audio-Visual Speech Recognition
☆26Jul 7, 2025Updated last year
hxixixh / mix-and-localize
View on GitHub
☆23Mar 20, 2024Updated 2 years ago
roger-tseng / av-superb
View on GitHub
A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)
☆58Apr 17, 2024Updated 2 years ago