kyegomez / MultiModalCrossAttn
The open source implementation of the cross attention mechanism from the paper: "JOINTLY TRAINING LARGE AUTOREGRESSIVE MULTIMODAL MODELS"
☆22Updated 8 months ago
Related projects ⓘ
Alternatives and complementary repositories for MultiModalCrossAttn
- Implementation of the paper: "Audio Mamba: Bidirectional State Space Model for Audio Representation Learning" in pytorch☆11Updated last week
- Code for NeurIPS 2023 paper "DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation".☆60Updated 4 months ago
- Implementation of Qformer from BLIP2 in Zeta Lego blocks.☆31Updated 2 weeks ago
- [ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer☆37Updated 3 weeks ago
- BLSP-Emo: Towards Empathetic Large Speech-Language Models☆39Updated 5 months ago
- Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dial…☆39Updated last week
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆67Updated last week
- An official repo for the paper "Adapting Language-Audio Models as Few-Shot Audio Learners"☆28Updated last year
- Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers☆83Updated last month
- Implementation of Multi-Source Music Generation with Latent Diffusion.☆18Updated 2 months ago
- [Official Implementation] Acoustic Autoregressive Modeling 🔥☆57Updated 3 months ago
- A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset (ACL 2024)☆14Updated last month
- The official GitHub page for the survey paper "Foundation Models for Music: A Survey".☆94Updated 2 months ago
- This repo contains the official PyTorch implementation of AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image …☆77Updated 5 months ago
- This is the official repository of the papers "Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers" and "Efficient Fi…☆36Updated 3 months ago
- The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression☆13Updated last month
- ☆23Updated 2 weeks ago
- Code for ICASSP 2024 Paper: RECAP: Retrieval-Augmented Audio Captioning☆10Updated 5 months ago
- Codebase and project page for EDMSound☆29Updated last year
- [ACM MM 2023] Official PyTorch implementation of "Emo-DNA: Emotion Decoupling and Alignment Learning for Cross-Corpus Speech Emotion Reco…☆11Updated last year
- ☆15Updated 6 months ago
- Implementation of a Light Recurrent Unit in Pytorch☆46Updated last month
- Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"☆29Updated last month
- Source code for the paper 'Audio Captioning Transformer'☆50Updated 2 years ago
- Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"☆24Updated 2 weeks ago
- EMO-SUPERB submission☆28Updated 2 months ago
- ☆9Updated 5 months ago
- 🦇 Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)☆33Updated last month
- Implementation of Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching (NeurIPS 2024)☆13Updated 2 weeks ago
- PyTorch Implementation of [AudioLCM]: a efficient and high-quality text-to-audio generation with latent consistency model.☆10Updated 5 months ago