kyegomez / MultiModalCrossAttn
The open source implementation of the cross attention mechanism from the paper: "JOINTLY TRAINING LARGE AUTOREGRESSIVE MULTIMODAL MODELS"
☆22Updated 8 months ago
Related projects ⓘ
Alternatives and complementary repositories for MultiModalCrossAttn
- Implementation of the paper: "Audio Mamba: Bidirectional State Space Model for Audio Representation Learning" in pytorch☆11Updated this week
- ☆15Updated 5 months ago
- Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dial…☆39Updated last week
- This is the official repository of the papers "Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers" and "Efficient Fi…☆35Updated 3 months ago
- An official repo for the paper "Adapting Language-Audio Models as Few-Shot Audio Learners"☆28Updated last year
- Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"☆24Updated last week
- A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset (ACL 2024)☆14Updated last month
- [ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer☆34Updated last week
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆62Updated 2 weeks ago
- BLSP-Emo: Towards Empathetic Large Speech-Language Models☆36Updated 5 months ago
- Implementation of Qformer from BLIP2 in Zeta Lego blocks.☆31Updated last week
- ☆22Updated this week
- Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"☆28Updated 3 weeks ago
- EMO-SUPERB submission☆28Updated 2 months ago
- SimVQ: Addressing Representation Collapse in Vector Quantized Models with One Linear Layer☆96Updated this week
- [Official Implementation] Acoustic Autoregressive Modeling 🔥☆57Updated 2 months ago
- Code for NeurIPS 2023 paper "DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation".☆60Updated 3 months ago
- Implementation of Google's USM speech model in Pytorch☆25Updated last week
- Pytorch implementation for “V2C: Visual Voice Cloning”☆30Updated last year
- 🦇 Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)☆32Updated last month
- Official release of StyleTalk dataset.☆57Updated 4 months ago
- ☆29Updated 11 months ago
- Source code for the paper 'Audio Captioning Transformer'☆50Updated 2 years ago
- Implementation of Multi-Source Music Generation with Latent Diffusion.☆16Updated 2 months ago
- [InterSpeech'2024] FluentEditor:Text-based Speech Editing by Considering Acoustic and Prosody Consistency☆48Updated 2 weeks ago
- [ACM MM 2023] Official PyTorch implementation of "Emo-DNA: Emotion Decoupling and Alignment Learning for Cross-Corpus Speech Emotion Reco…☆10Updated last year
- ☆9Updated 5 months ago
- Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers☆81Updated 2 weeks ago
- ☆34Updated 6 months ago
- ☆11Updated 3 months ago