JongSuk1 / EquiAV
☆15Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for EquiAV
- Vision Transformers are Parameter-Efficient Audio-Visual Learners☆89Updated last year
- [ECCV’24] Official Implementation for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenario…☆41Updated 2 months ago
- Codebase for the paper: "TIM: A Time Interval Machine for Audio-Visual Action Recognition"☆37Updated 2 weeks ago
- ☆18Updated last month
- This repository contains the code for our CVPR 2022 paper on "Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and …☆34Updated last year
- ☆14Updated 7 months ago
- This repo contains the official PyTorch implementation of AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image …☆77Updated 5 months ago
- Official Pytorch Implementation of Our CVPR2023 Paper: "Not All Image Regions Matter: Masked Vector Quantization for Autoregressive Image…☆52Updated last year
- Official implementation for CIGN☆14Updated last year
- Official Pytorch implementation of "Improved Probabilistic Image-Text Representations" (ICLR 2024)☆51Updated 5 months ago
- Official repository for "Boosting Audio Visual Question Answering via Key Semantic-Aware Cues" in ACM MM 2024.☆14Updated 3 weeks ago
- The code of the paper "Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation" (CVPR2023)☆18Updated last year
- Distribution-Aware Prompt Tuning for Vision-Language Models (ICCV 2023)☆37Updated 11 months ago
- Official Codebase of "Localizing Visual Sounds the Easy Way" (ECCV 2022)☆30Updated 2 years ago
- ☆14Updated 3 years ago
- ☆22Updated 3 months ago
- ☆28Updated last month
- [CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners☆128Updated 4 months ago
- Official codebase for "Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling".☆21Updated 3 months ago
- ☆16Updated last year
- Official code for WACV 2024 paper, "Annotation-free Audio-Visual Segmentation"☆26Updated last month
- Locally Hierarchical Auto-Regressive Modeling for Image Generation (HQ-Transformer)☆26Updated 9 months ago
- Repository of the WACV'24 paper "Can CLIP Help Sound Source Localization?"☆14Updated 7 months ago
- [AAAI 2024] AVSegFormer: Audio-Visual Segmentation with Transformer☆56Updated 7 months ago
- This repo contains script to download MUSIC dataset from youtube☆8Updated 10 months ago
- Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)☆13Updated 3 years ago
- ☆102Updated 4 months ago
- Official Pytorch implementation of EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens [ICML2024].☆21Updated 5 months ago
- The official implementation of MAGVLT: Masked Generative Vision-and-Language Transformer (CVPR'23)☆26Updated 10 months ago
- [Arxiv 2024] Official code for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions☆24Updated this week