BriansIDP / AudioVisualLLM
☆17Updated 9 months ago
Alternatives and similar repositories for AudioVisualLLM:
Users that are interested in AudioVisualLLM are comparing it to the libraries listed below
- The dataset and baseline code for Text-to-Audio Grounding (TAG)☆42Updated last month
- Source code for the paper 'Audio Captioning Transformer'☆53Updated 3 years ago
- code for A Large-scale Dataset for Audio-Language Representation Learning☆11Updated 5 months ago
- This repo contains the official PyTorch implementation of AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image …☆80Updated 8 months ago
- ☆17Updated last year
- ☆25Updated 4 months ago
- The code repo for ICASSP 2023 Paper "MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning"☆19Updated last year
- ☆26Updated 6 months ago
- This repo contains script to download MUSIC dataset from youtube☆8Updated last year
- Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".☆54Updated last year
- Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"☆26Updated 3 weeks ago
- [2023 TPAMI] Contrastive Positive Sample Propagation along the Audio-Visual Event Line☆28Updated last year
- Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dial…☆40Updated 3 weeks ago
- Implementation of our paper 'On Metric Learning For Audio-Text Cross-Modal Retrieval'☆43Updated 2 years ago
- Implementation of "Audio Retrieval with Natural Language Queries: A Benchmark Study".☆49Updated 2 years ago
- ☆43Updated 3 weeks ago
- Code for the C2KD paper (ICASSP 2023)☆19Updated last year
- Pytorch implementation for “V2C: Visual Voice Cloning”☆30Updated 2 years ago
- Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.☆41Updated last month
- ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning. In ICCV, 2021.☆56Updated 3 years ago
- UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model☆21Updated 6 months ago
- Towards Long Form Audio-visual Video Understanding☆13Updated 3 months ago
- A dataset for Audio-Visual Sound Event Detection in Movies☆27Updated 2 years ago
- [ECCV’24] Official Implementation for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenario…☆48Updated 5 months ago
- This is the official repository of the papers "Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers" and "Efficient Fi…☆36Updated 6 months ago
- Source code for "Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors." (Spotlight at the BMVC 2022)☆52Updated last year
- ☆11Updated last year
- Repository having the code and models from the paper: data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student traini…☆11Updated 11 months ago
- Multi-Scale Attention for Audio Question Answering☆28Updated last year