zailongchen / Audio-Visual-Question-Answering-AVQALinks

This task is based on MUSIC-AVQA Dataset. And we focus on optimize the accuracy of AVQA task, which aims to answer questions regarding different visual objects, sounds, and their associations in videos. The problem requires comprehensive multimodal understanding and spatio-temporal reasoning over audio-visual scenes.
12Updated 2 years ago

Alternatives and similar repositories for Audio-Visual-Question-Answering-AVQA

Users that are interested in Audio-Visual-Question-Answering-AVQA are comparing it to the libraries listed below

Sorting: