showlab/AVA-AVD

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/showlab/AVA-AVD)

showlab / AVA-AVD

☆22

Alternatives and similar repositories for AVA-AVD

Users that are interested in AVA-AVD are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Overcautious / ADENet
View on GitHub
Accepted by TMM 2022
☆19Aug 18, 2022Updated 3 years ago
yangdongchao / Tim-TSENet
View on GitHub
The source code of Tim-TSENet
☆15Apr 22, 2022Updated 4 years ago
X-LANCE / MSDWILD
View on GitHub
[INTERSPEECH 2022] This dataset is designed for multi-modal speaker diarization and lip-speech synchronization in the wild.
☆65Jan 24, 2024Updated 2 years ago
zaocan666 / DyViSE
View on GitHub
Dynamic vision-guided speaker embedding for audio-visual speaker diarization
☆12Jul 5, 2022Updated 4 years ago
Tiago-Roxo / WASD
View on GitHub
☆20Mar 20, 2026Updated 4 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
SJTUwxz / LoCoNet_ASD
View on GitHub
code repo for LoCoNet: Long-Short Context Network for Active Speaker Detection
☆57May 1, 2023Updated 3 years ago
joonson / voxconverse
View on GitHub
Spot the conversation: speaker diarisation in the wild
☆170Jul 26, 2022Updated 3 years ago
desh2608 / diarizer
View on GitHub
Clustering-based methods for overlapping diarization
☆84Jan 12, 2024Updated 2 years ago
jyjunmcl / Depth-Map-Decomposition
View on GitHub
☆10Sep 11, 2022Updated 3 years ago
jlazarow / learning_instance_occlusion
View on GitHub
Code for the CVPR 2020 paper "Learning Instance Occlusion for Panoptic Segmentation"
☆13Jun 17, 2020Updated 6 years ago
sztimhdd / Looping-Claude
View on GitHub
This is a project that aims to use Claude.ai's coding capabilities, artifact capabilities, and project capabilities to create a new metho…
☆12Jan 31, 2025Updated last year
ddddwee1 / MMD_3D_POSE_Converter
View on GitHub
Convert 3D Human Pose to VMD file
☆14Apr 21, 2019Updated 7 years ago
wade3han / normlens
View on GitHub
An official codebase for "NormLens: Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Comm…
☆10May 9, 2024Updated 2 years ago
Junhua-Liao / Light-ASD
View on GitHub
The repository for IEEE CVPR 2023 (A Light Weight Model for Active Speaker Detection)
☆181Mar 23, 2025Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
Hangz-nju-cuhk / Vision-Infused-Audio-Inpainter-VIAI
View on GitHub
Code for Vision-Infused Deep Audio Inpainting (ICCV 2019)
☆58Oct 25, 2019Updated 6 years ago
Edresson / GE2E-Speaker-Encoder
View on GitHub
GE2E Speaker Encoder - Generalized End-To-End Loss for Speaker Verification
☆14May 17, 2020Updated 6 years ago
JiwanChung / VisArgs
View on GitHub
Corpus to accompany: "Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding"
☆11Apr 11, 2025Updated last year
hugoliv / projectvertices
View on GitHub
☆11Dec 19, 2020Updated 5 years ago
zishen-ucap / LTX-Video-xDiT
View on GitHub
This project is based on the [LTX-Video](https://github.com/Lightricks/LTX-Video) algorithm of the diffusers and optimized and accelerate…
☆15Dec 31, 2024Updated last year
shrezaei / Target-Agnostic-Attack
View on GitHub
Target Agnostic Attack on Deep Models: Exploiting Security Vulnerabilities of Transfer Learning
☆10Jul 2, 2019Updated 7 years ago
Sreyan88 / LipGER
View on GitHub
Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
☆19Jul 16, 2024Updated 2 years ago
GATECH-EIC / S3-Router
View on GitHub
[NeurIPS 2022] "Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Spee…
☆17Sep 19, 2023Updated 2 years ago
akngs / nlpip
View on GitHub
A unix pipeline utils based on LLM
☆16May 15, 2023Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
allenai / interscript
View on GitHub
The InterScript dataset contains interactive user feedback on scripts generated by a T5-XXL model.
☆12Dec 15, 2021Updated 4 years ago
SMILE-data / SMILE
View on GitHub
SMILE: A Multimodal Dataset for Understanding Laughter
☆13Jun 15, 2023Updated 3 years ago
jczhang02 / MUSIC_dataset_script
View on GitHub
This repo contains script to download MUSIC dataset from youtube
☆12Jan 19, 2024Updated 2 years ago
W-Wu / DEER
View on GitHub
☆12Aug 25, 2023Updated 2 years ago
zexupan / MuSE
View on GitHub
☆42Nov 22, 2024Updated last year
robotology / superquadric-model
View on GitHub
Framework for modeling and visualizing objects through superquadrics
☆10Feb 5, 2019Updated 7 years ago
JorisCos / VCTK-2Mix
View on GitHub
☆19Jul 12, 2020Updated 6 years ago
showlab / Long-form-Video-Prior
View on GitHub
☆32May 3, 2024Updated 2 years ago
W-Wu / ERC-SLT22
View on GitHub
Code for "Distribution-based Emotion Recognition in Conversation"
☆18Feb 6, 2023Updated 3 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
RanaCM / DSU-AVO
View on GitHub
Source code and speech samples for the DSU-AVO paper accepted to INTERSPEECH 2023
☆12May 13, 2024Updated 2 years ago
DuGuYifei / PoseDetect2UnityModel
View on GitHub
Use MHFormer [CVPR 2022] to do pose estimation and use Unity to control rig of model. (not real-time)
☆18Sep 14, 2022Updated 3 years ago
zihuixue / MFH
View on GitHub
[ICLR 23 oral] The Modality Focusing Hypothesis: Towards Understanding Crossmodal Knowledge Distillation
☆44Jul 10, 2023Updated 3 years ago
stoneMo / OneAVM
View on GitHub
Official Codebase of "A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition" (ICML 2023)
☆12Jun 1, 2023Updated 3 years ago
HENU-CS / SurvivalHandbook
View on GitHub
HENU Survival Handbook/Study Abroad Handbook （河大生存/飞跃手册）
☆16Feb 8, 2026Updated 5 months ago
ichi131 / Direction-based-BiTSE
View on GitHub
☆15Sep 19, 2024Updated last year
TaoRuijie / TalkNet-ASD
View on GitHub
ACM MM 2021: 'Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection'
☆488Oct 23, 2023Updated 2 years ago