A curated list of different papers and datasets in various areas of audio-visual processing
☆766Jan 30, 2024Updated 2 years ago
Alternatives and similar repositories for awesome-audio-visual
Users that are interested in awesome-audio-visual are comparing it to the libraries listed below
Sorting:
- A curated list of audio-visual learning methods and datasets.☆286Dec 3, 2024Updated last year
- Audio-Visual Event Localization in Unconstrained Videos, ECCV 2018☆203Apr 3, 2021Updated 4 years ago
- Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing, ECCV, 2020. (Spotlight)☆90Jul 25, 2024Updated last year
- VGGSound: A Large-scale Audio-Visual Dataset☆351Sep 13, 2021Updated 4 years ago
- Implementation for ECCV20 paper "Self-Supervised Learning of audio-visual objects from video"☆115Nov 16, 2020Updated 5 years ago
- Deep-Learning-Based Audio-Visual Speech Enhancement and Separation☆219Apr 16, 2023Updated 2 years ago
- [2021 CVPR] Positive Sample Propagation along the Audio-Visual Event Line☆42Jul 5, 2022Updated 3 years ago
- Code for CVPR 2021 paper Exploring Heterogeneous Clues for Weakly-Supervised Audio-Visual Video Parsing☆24Dec 29, 2021Updated 4 years ago
- A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.☆243Feb 15, 2024Updated 2 years ago
- Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".☆287Mar 20, 2024Updated last year
- A self-supervised learning framework for audio-visual speech☆972Dec 7, 2023Updated 2 years ago
- Vision Transformers are Parameter-Efficient Audio-Visual Learners☆106Aug 11, 2023Updated 2 years ago
- PyTorch Implementation on Paper [CVPR2021]Distilling Audio-Visual Knowledge by Compositional Contrastive Learning☆89Jul 7, 2021Updated 4 years ago
- [CVPR 2023] Egocentric Audio-Visual Object Localization☆26Jan 6, 2024Updated 2 years ago
- 2.5D visual sound☆118Jul 25, 2023Updated 2 years ago
- A Pytorch implementation of the paper : SpecAugment++: A Hidden Space Data Augmentation Method for Acoustic Scene Classification☆34Jun 25, 2021Updated 4 years ago
- ☆22Mar 20, 2024Updated last year
- Towards Long Form Audio-visual Video Understanding☆15Jan 16, 2026Updated last month
- Codebase for the paper "Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation" (ECCV2020)☆72Oct 20, 2020Updated 5 years ago
- Code for Discriminative Sounding Objects Localization (NeurIPS 2020)☆59Jan 19, 2022Updated 4 years ago
- Audio-Visual Speech Separation with Cross-Modal Consistency☆246Jul 25, 2023Updated 2 years ago
- Co-Separating Sounds of Visual Objects (ICCV 2019)☆99Jul 25, 2023Updated 2 years ago
- Spatial Audio Generation☆117Mar 24, 2023Updated 2 years ago
- Self-Supervised Speech Pre-training and Representation Learning Toolkit☆2,533Jun 13, 2025Updated 8 months ago
- A wrapper around speech quality metrics MOSNet, BSSEval, STOI, PESQ, SRMR, SISDR☆1,036Jul 5, 2023Updated 2 years ago
- The Easy Communications (EasyCom) dataset is a world-first dataset designed to help mitigate the *cocktail party effect* from an augmente…☆132Dec 4, 2023Updated 2 years ago
- cross modal background suppression for audio-visual event localization☆36Mar 18, 2022Updated 3 years ago
- Codebase and Dataset for the paper: Learning to Localize Sound Source in Visual Scenes☆97Dec 4, 2024Updated last year
- Cross-Modal Relation-Aware Networks for Audio-Visual Event Localization, ACM MM 2020☆32Nov 6, 2020Updated 5 years ago
- Listen to Look: Action Recognition by Previewing Audio (CVPR 2020)☆130Aug 31, 2021Updated 4 years ago
- Official implementation for AVGN☆40Mar 24, 2023Updated 2 years ago
- speech enhancement\speech seperation\sound source localization☆1,227Nov 14, 2023Updated 2 years ago
- [ACM MM 2022] MM_Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing☆16Aug 26, 2022Updated 3 years ago
- Audio processing by using pytorch 1D convolution network☆1,117Dec 7, 2025Updated 2 months ago
- Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments☆111Mar 19, 2024Updated last year
- Localizing Visual Sounds the Hard Way☆82Jul 6, 2022Updated 3 years ago
- A library for speech data augmentation in time-domain☆683Aug 30, 2021Updated 4 years ago
- Official implementation for MGN☆20Dec 22, 2022Updated 3 years ago
- This repo summarizes the tutorials, datasets, papers, codes and tools for speech separation and speaker extraction task. You are kindly i…☆474Jan 9, 2021Updated 5 years ago