GeWu-Lab/MWAFM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/GeWu-Lab/MWAFM)

GeWu-Lab / MWAFM

Multi-Scale Attention for Audio Question Answering

☆28

Alternatives and similar repositories for MWAFM

Users that are interested in MWAFM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

GeWu-Lab / PSTP-Net
View on GitHub
☆17Aug 11, 2023Updated 2 years ago
GeWu-Lab / TSPM
View on GitHub
Official repository for "Boosting Audio Visual Question Answering via Key Semantic-Aware Cues" in ACM MM 2024.
☆17Oct 25, 2024Updated last year
GeWu-Lab / MUSIC-AVQA
View on GitHub
MUSIC-AVQA, CVPR2022 (ORAL)
☆100Dec 30, 2022Updated 3 years ago
fyyCS / LSLD
View on GitHub
☆14Nov 13, 2023Updated 2 years ago
GeWu-Lab / CSOL_TPAMI2021
View on GitHub
The repo for "Class-aware Sounding Objects Localization", TPAMI 2021.
☆29Mar 4, 2022Updated 4 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
keep-smile-001 / opentqa
View on GitHub
opentqa is a open framework of the textbook question answering, which includes xtqa, mcan, cmr, mfb, mutan.
☆11Mar 27, 2021Updated 5 years ago
raotnameh / End-to-end-E2E-Named-Entity-Recognition-from-English-Speech
View on GitHub
☆32Dec 2, 2020Updated 5 years ago
hltcoe / gazetteer-collection
View on GitHub
☆12Mar 31, 2020Updated 6 years ago
v-manhlt3 / m-LTM-Audio-Text-Retrieval
View on GitHub
☆13Jan 5, 2025Updated last year
feizc / DeeCap
View on GitHub
Dynamic Early Exit for Image Captioning
☆17Oct 25, 2022Updated 3 years ago
MGitHubL / TMac
View on GitHub
☆14Feb 26, 2024Updated 2 years ago
Franklin905 / VALOR
View on GitHub
Research code for NeurIPS 2023 paper "Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser"
☆17Jul 13, 2025Updated last year
GeWu-Lab / Crab
View on GitHub
[CVPR 2025] Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation
☆85Dec 24, 2025Updated 6 months ago
showlab / mist
View on GitHub
☆37Dec 20, 2023Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
schowdhury671 / meerkat
View on GitHub
☆35Jul 9, 2025Updated last year
GeWu-Lab / awesome-audiovisual-learning
View on GitHub
A curated list of audio-visual learning methods and datasets.
☆288Dec 3, 2024Updated last year
yl3800 / TranSTR
View on GitHub
☆12Dec 15, 2023Updated 2 years ago
WikiChao / Ego-AV-Loc
View on GitHub
[CVPR 2023] Egocentric Audio-Visual Object Localization
☆27Jan 6, 2024Updated 2 years ago
nomonosound / log-wmse-audio-quality
View on GitHub
logWMSE, an audio quality metric with support for digital silence target. Useful for evaluating audio source separation systems, even whe…
☆39Jun 24, 2025Updated last year
JinhuaLiang / lam4fsl
View on GitHub
An official repo for the paper "Adapting Language-Audio Models as Few-Shot Audio Learners"
☆31May 31, 2023Updated 3 years ago
WHB139426 / GCG
View on GitHub
Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering [ACM MM'24]
☆10Jul 22, 2024Updated 2 years ago
weiguoPian / AV-CIL_ICCV2023
View on GitHub
[ICCV 2023] Audio-Visual Class-Incremental Learning
☆35Sep 29, 2024Updated last year
sangho-vision / avbert
View on GitHub
☆31Sep 20, 2021Updated 4 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
lianshiwei / datavisualization.github.io
View on GitHub
中国历年GDP和人口数据可视化
☆13Jan 18, 2023Updated 3 years ago
liuxubo717 / LASS-demopage
View on GitHub
☆19Sep 2, 2022Updated 3 years ago
princetonvisualai / MQVR
View on GitHub
☆26Jan 12, 2022Updated 4 years ago
soham97 / mellow
View on GitHub
small audio language model for reasoning
☆88Dec 4, 2025Updated 7 months ago
jasongief / CPSP
View on GitHub
[2022 TPAMI] Contrastive Positive Sample Propagation along the Audio-Visual Event Line
☆32Mar 6, 2023Updated 3 years ago
GenjiB / LAVISH
View on GitHub
Vision Transformers are Parameter-Efficient Audio-Visual Learners
☆107Aug 11, 2023Updated 2 years ago
zhiyuanhubj / Long_form_VideoQA
View on GitHub
[EMNLP’24 Main] Encoding and Controlling Global Semantics for Long-form Video Question Answering
☆18Oct 9, 2024Updated last year
archinetai / audio-encoders-pytorch
View on GitHub
A collection of audio autoencoders, in PyTorch.
☆44Mar 7, 2023Updated 3 years ago
StanfordVL / Sonicverse
View on GitHub
☆22Mar 18, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
doc-doc / CoVGT
View on GitHub
Contrastive Video Question Answering via Video Graph Transformer (IEEE T-PAMI'23)
☆20Mar 9, 2024Updated 2 years ago
FloretCat / CMRAN
View on GitHub
Cross-Modal Relation-Aware Networks for Audio-Visual Event Localization， ACM MM 2020
☆33Nov 6, 2020Updated 5 years ago
jgoerzen / nncp
View on GitHub
Debian packaging for NNCP [archived], moved to https://salsa.debian.org/go-team/packages/nncp
☆14Feb 18, 2023Updated 3 years ago
GaochangWu / FMF-Benchmark
View on GitHub
This is a cross-modal benchmark for industrial anomaly detection.
☆26Jun 8, 2026Updated last month
ttgeng233 / UnAV
View on GitHub
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)
☆73Jan 4, 2026Updated 6 months ago
maelfabien / DataVisualization
View on GitHub
A Data Visualization project on the French traffic accidents database
☆19Aug 27, 2019Updated 6 years ago
GeWu-Lab / MS-Bot
View on GitHub
The offical repo for "Play to the Score: Stage-Guided Dynamic Multi-Sensory Fusion for Robotic Manipulation", CoRL 2024 (ORAL)
☆22Jun 25, 2025Updated last year