BriansIDP / AudioVisualLLM
☆15Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for AudioVisualLLM
- This repo contains the official PyTorch implementation of AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image …☆76Updated 4 months ago
- ☆16Updated last year
- Source code for the paper 'Audio Captioning Transformer'☆50Updated 2 years ago
- [2023 TPAMI] Contrastive Positive Sample Propagation along the Audio-Visual Event Line☆22Updated last year
- [ECCV’24] Official Implementation for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenario…☆39Updated 2 months ago
- This repo contains script to download MUSIC dataset from youtube☆8Updated 9 months ago
- The dataset and baseline code for Text-to-Audio Grounding (TAG)☆37Updated 2 months ago
- Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"☆24Updated this week
- Pytorch implementation for “V2C: Visual Voice Cloning”☆30Updated last year
- Code for the C2KD paper (ICASSP 2023)☆16Updated last year
- Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".☆54Updated last year
- ☆29Updated 11 months ago
- ☆14Updated 6 months ago
- Multi-Scale Attention for Audio Question Answering☆27Updated last year
- ☆22Updated 7 months ago
- 16k Hz Vocoder (HiFiGAN Codes and Pretrained Models)☆16Updated last year
- ☆10Updated last year
- SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model, Accepted to IEEE SLT 2022☆109Updated last year
- code for A Large-scale Dataset for Audio-Language Representation Learning☆10Updated last month
- Official implementation of RAVEn (ICLR 2023) and BRAVEn (ICASSP 2024)☆55Updated 3 months ago
- ☆11Updated 3 months ago
- ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning. In ICCV, 2021.☆55Updated 2 years ago
- The open source implementation of the cross attention mechanism from the paper: "JOINTLY TRAINING LARGE AUTOREGRESSIVE MULTIMODAL MODELS"☆22Updated 7 months ago
- [ACM MM 2023] Official PyTorch implementation of "Emo-DNA: Emotion Decoupling and Alignment Learning for Cross-Corpus Speech Emotion Reco…☆10Updated last year
- A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset (ACL 2024)☆14Updated last month
- ☆14Updated 3 years ago
- Vision Transformers are Parameter-Efficient Audio-Visual Learners☆85Updated last year
- [ICLR2024] The official implementation of paper "UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling", by …☆69Updated 9 months ago
- An official repo for the paper "Adapting Language-Audio Models as Few-Shot Audio Learners"☆28Updated last year