☆39Nov 22, 2024Updated last year
Alternatives and similar repositories for Multimodal-Fusion-with-Attention-Bottlenecks
Users that are interested in Multimodal-Fusion-with-Attention-Bottlenecks are comparing it to the libraries listed below
Sorting:
- Deep Variational Information Bottleneck (DVIB) in PyTorch.☆10Apr 25, 2020Updated 5 years ago
- ☆17Jan 1, 2024Updated 2 years ago
- PyTorch implementation of "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scorin…☆21Apr 3, 2024Updated last year
- Baseline system for CNVSRC2023 (Chinese Continuous Visual Speech Recognition Challenge 2023)☆22Apr 27, 2024Updated last year
- ☆29Aug 22, 2024Updated last year
- PyTorch implementation of the models described in the IEEE ICASSP 2022 paper "Is cross-attention preferable to self-attention for multi-m…☆64Mar 29, 2025Updated 11 months ago
- Pytorch implementation of conformer with with training script for end-to-end speech recognition on the LibriSpeech dataset.☆28May 1, 2024Updated last year
- Audio-Visual Corruption Modeling of our paper "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling an…☆35Jun 20, 2023Updated 2 years ago
- Synthesizer Self-Attention is a very recent alternative to causal self-attention that has potential benefits by removing this dot product…☆14Dec 29, 2024Updated last year
- Transfer Learning☆10Aug 3, 2018Updated 7 years ago
- PyTorch implementation of "Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video" (ICCV2021)☆20Apr 11, 2022Updated 3 years ago
- Anki add-on that adds Pinyin and Zhuyin readings above Chinese characters in any field.☆12Sep 23, 2025Updated 5 months ago
- Official implementation of the paper "LTrack: Generalizing Multiple Object Tracking to Unseen Domains by Introducing Natural Language Rep…☆12Jul 26, 2023Updated 2 years ago
- Official implementation of CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation☆12Dec 5, 2025Updated 3 months ago
- ☆11Oct 29, 2024Updated last year
- ☆10May 14, 2024Updated last year
- The code for AAAI 2025 “Large Language Models Are Read/Write Policy-Makers for Simultaneous Generation”☆15Jan 3, 2025Updated last year
- ☆10May 12, 2023Updated 2 years ago
- Dataset and pre-trained model of EMNLP-IJCNLP 2019 paper "TalkDown: A Corpus for Condescension Detection in Context."☆10Jan 26, 2020Updated 6 years ago
- Recently, image classification draw attentions of many researchers. The need of object recognition grows drastically, especially in the …☆11May 14, 2017Updated 8 years ago
- Methods to extract tracks from time-frequency distributions; tracks can represent instantaneous frequency (IF) laws☆10May 11, 2016Updated 9 years ago
- Arctic sea ice interannual variability and change☆11Mar 26, 2018Updated 7 years ago
- This repository contains the speaker labeled information of VoxCeleb2 and LRS3 audio-visual datasets. (AAAI 2025)☆13Sep 6, 2024Updated last year
- Companion code for Awe the Audience: How the Narrative Trajectories Affect Audience Perception in Public Speaking☆14Jan 6, 2018Updated 8 years ago
- Official implementation of DGP-based multi-speaker speech synthesis with PyTorch☆24Mar 23, 2021Updated 4 years ago
- Unlocking the Essence of Beauty: Advanced Aesthetic Reasoning with Relative-Absolute Policy Optimization☆21Jan 27, 2026Updated last month
- ☆15Dec 7, 2025Updated 3 months ago
- Official repository for ACM Multimedia'24 paper "MultiHateClip: A Multilingual Benchmark Dataset for Hateful Video Detection on YouTube a…☆18Aug 11, 2024Updated last year
- Official source code for the paper "Tailored Design of Audio-Visual Speech Recognition Models using Branchformers"☆14Feb 24, 2025Updated last year
- EmoCapCLIP: Learning Transferable Facial Emotion Representations from Large-Scale Semantically Rich Captions☆20Jul 29, 2025Updated 7 months ago
- ☆14Jul 27, 2022Updated 3 years ago
- Official code repo of SimMLM [ICCV 2025]☆21Dec 1, 2025Updated 3 months ago
- ☆11Jul 18, 2022Updated 3 years ago
- [ACL 2025] RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios☆23Jul 2, 2025Updated 8 months ago
- The Sea Ice Evaluation Tool (SITool) is a performance metrics and diagnostics tool developed to evaluate the model skills in simulating t…☆12May 17, 2023Updated 2 years ago
- Comparing performance of different InfoNCE type losses used in contrastive learning.☆14Jun 12, 2024Updated last year
- [Neurocomputing] EmoVerse: Enhancing Multimodal Large Language Models for Affective Computing via Multitask Learning☆16Jul 6, 2025Updated 8 months ago
- Image Features and Matching - SIFT and SURF☆11Sep 5, 2020Updated 5 years ago
- Code for "Salient Deconvolutional Networks, Aravindh Mahendran, Andrea Vedaldi, ECCV 2016"☆12Sep 28, 2016Updated 9 years ago