krantiparida / awesome-audio-visualLinks

A curated list of different papers and datasets in various areas of audio-visual processing

☆734

Alternatives and similar repositories for awesome-audio-visual

Users that are interested in awesome-audio-visual are comparing it to the libraries listed below

Sorting:

GeWu-Lab / awesome-audiovisual-learning
A curated list of audio-visual learning methods and datasets.
☆263Updated 6 months ago
hche11 / VGGSound
VGGSound: A Large-scale Audio-Visual Dataset
☆321Updated 3 years ago
facebookresearch / VisualVoice
Audio-Visual Speech Separation with Cross-Modal Consistency
☆232Updated last year
facebookresearch / AudioMAE
This repo hosts the code and models of "Masked Autoencoders that Listen".
☆593Updated last year
YuanGongND / cav-mae
Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".
☆260Updated last year
danmic / av-se
Deep-Learning-Based Audio-Visual Speech Enhancement and Separation
☆210Updated 2 years ago
harritaylor / torchvggish
Pytorch port of Google Research's VGGish model used for extracting audio features.
☆392Updated 3 years ago
smeetrs / deep_avsr
A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.
☆233Updated last year
RetroCirce / HTS-Audio-Transformer
The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"
☆415Updated 10 months ago
YapengTian / AVE-ECCV18
Audio-Visual Event Localization in Unconstrained Videos, ECCV 2018
☆185Updated 4 years ago
AndreyGuzhov / AudioCLIP
Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)
☆820Updated 3 years ago
DmitryRyumin / ICASSP-2023-24-Papers
ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore t…
☆474Updated last month
rhgao / co-separation
Co-Separating Sounds of Visual Objects (ICCV 2019)
☆96Updated last year
afourast / avobjects
Implementation for ECCV20 paper "Self-Supervised Learning of audio-visual objects from video"
☆113Updated 4 years ago
microsoft / CLAP
Learning audio concepts from natural language supervision
☆567Updated 9 months ago
facebookresearch / Listen-to-Look
Listen to Look: Action Recognition by Previewing Audio (CVPR 2020)
☆130Updated 3 years ago
DmitryRyumin / INTERSPEECH-2023-24-Papers
INTERSPEECH 2023-2024 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023-24 conference. …
☆674Updated 6 months ago
YuanGongND / ssast
Code for the AAAI 2022 paper "SSAST: Self-Supervised Audio Spectrogram Transformer".
☆387Updated 2 years ago
ardasnck / learning_to_localize_sound_source
Codebase and Dataset for the paper: Learning to Localize Sound Source in Visual Scenes
☆92Updated 6 months ago
kkoutini / PaSST
Efficient Training of Audio Transformers with Patchout
☆339Updated last year
liyidi / soundnet_localize_sound_source
soundnet and localize sound source
☆11Updated 4 years ago
v-iashin / video_features
Extract video features from raw videos using multiple GPUs. We support RAFT flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, and T…
☆600Updated 4 months ago
descriptinc / lyrebird-wav2clip
Official implementation of the paper WAV2CLIP: LEARNING ROBUST AUDIO REPRESENTATIONS FROM CLIP
☆346Updated 3 years ago
LAION-AI / audio-dataset
Audio Dataset for training CLAP and other models
☆688Updated last year
wzk1015 / video-bgm-generation
[ACM MM 2021 Best Paper Award] Video Background Music Generation with Controllable Music Transformer
☆313Updated 2 weeks ago
v-iashin / SpecVQGAN
Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)
☆363Updated 11 months ago
speedyseal / audiosetdl
Scripts for download AudioSet
☆79Updated 7 years ago
iver56 / torch-audiomentations
Fast audio data augmentation in PyTorch. Inspired by audiomentations. Useful for deep learning.
☆1,053Updated 5 months ago
hangzhaomit / Sound-of-Pixels
Codebase for ECCV18 "The Sound of Pixels"
☆382Updated 3 years ago
YuanGongND / ltu
Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
☆440Updated last year