The MAVD represents Mandarin Audio-Visual dataset with Depth information. MAVD has a rich variety of modal data, including audio, RGB images and depth images, etc.
☆20Apr 22, 2024Updated last year
Alternatives and similar repositories for MAVD
Users that are interested in MAVD are comparing it to the libraries listed below
Sorting:
- ☆27Jun 27, 2023Updated 2 years ago
- An unofficial (PyTorch) implementation for the paper Deep Lip Reading: A comparison of models and an online application.☆10May 13, 2020Updated 5 years ago
- [Interspeech 2023] Intelligible Lip-to-Speech Synthesis with Speech Units☆47Oct 26, 2024Updated last year
- Baseline system for CNVSRC2023 (Chinese Continuous Visual Speech Recognition Challenge 2023)☆22Apr 27, 2024Updated last year
- 复现Wav2Lip作者新的论文☆20Jun 20, 2023Updated 2 years ago
- This is official inference code of PD-FGC☆100Oct 15, 2023Updated 2 years ago
- ☆62Jun 28, 2023Updated 2 years ago
- ☆31Oct 29, 2024Updated last year
- Code for "SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking Faces" ACM MM 2023☆30Jul 29, 2023Updated 2 years ago
- [AAAI 2026] FantasyTalking2: Timestep-Layer Adaptive Preference Optimization for Audio-Driven Portrait Animation☆64Aug 20, 2025Updated 6 months ago
- Official repository for the paper VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices☆73Apr 7, 2024Updated last year
- A pytorch implementation of D3Net.☆11Aug 8, 2021Updated 4 years ago
- arxiv daily for speech translation, legal. Ref: Vincentqyw/cv-arxiv-daily☆15Jan 6, 2025Updated last year
- ☆33Mar 17, 2023Updated 2 years ago
- A Real-Time High-Definition Teeth Restoration Network for ArbitraryTalking Face Generation Methods☆146Sep 18, 2023Updated 2 years ago
- Anki add-on that adds Pinyin and Zhuyin readings above Chinese characters in any field.☆12Sep 23, 2025Updated 5 months ago
- The code for AAAI 2025 “Large Language Models Are Read/Write Policy-Makers for Simultaneous Generation”☆15Jan 3, 2025Updated last year
- Russian phonetical transcription☆11Nov 19, 2025Updated 3 months ago
- Code for the paper "IFFNeRF: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model"☆12May 26, 2024Updated last year
- ☆62Jul 1, 2025Updated 8 months ago
- LLIA - Enabling Low-Latency Interactive Avatars: Real-Time Audio-Driven Portrait Video Generation with Diffusion Models☆148Jun 11, 2025Updated 8 months ago
- (Interspeech 2023 & ICASSP 2024) Official repository for ARMHuBERT and STaRHuBERT☆40Aug 29, 2024Updated last year
- Frm-Hpe: Full-Range Markerless Head Pose Estimation☆53Sep 9, 2024Updated last year
- Visually-Aware Audio Captioning☆43Mar 3, 2023Updated 2 years ago
- [ICASSP 2023] Mingling or Misalignment? Temporal Shift for Speech Emotion Recognition with Pre-trained Representations☆40Dec 18, 2023Updated 2 years ago
- Just a suturing monster project.☆42Nov 21, 2023Updated 2 years ago
- vertex and uv texture map☆12Mar 13, 2023Updated 2 years ago
- [🔥ACM MM2025] EchoMask: Speech-Queried Attention-based Mask Modeling for Holistic Co-Speech Motion Generation☆23Dec 30, 2025Updated 2 months ago
- HippoMM: Hippocampal-inspired Multimodal Memory☆15May 22, 2025Updated 9 months ago
- ☆12May 22, 2022Updated 3 years ago
- ☆10Nov 19, 2023Updated 2 years ago
- A Python client for Deepgram's Voice Agent API☆10Oct 14, 2025Updated 4 months ago
- ☆11May 28, 2023Updated 2 years ago
- [ICASSP 2023] This repository includes the official project of C2FVL, presented in our paper: COARSE-TO-FINE COVID-19 SEGMENTATION VIA VI…☆12Sep 18, 2025Updated 5 months ago
- This repo has moved to https://github.com/haosulab/ManiSkill☆15May 28, 2025Updated 9 months ago
- [ACM-MM 2025 Workshop] More Is Better: A MoE-Based Emotion Recognition Framework with Human Preference Alignment.☆25Nov 25, 2025Updated 3 months ago
- ☆13Nov 22, 2022Updated 3 years ago
- Agentic Keyframe Search for Video Question Answering☆16Apr 7, 2025Updated 10 months ago
- ☆14Jun 11, 2025Updated 8 months ago