This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl.acm.org/doi/abs/10.1145/3617833 .
☆84Jun 16, 2025Updated 9 months ago
Alternatives and similar repositories for Multimodality-Representation-Learning
Users that are interested in Multimodality-Representation-Learning are comparing it to the libraries listed below
Sorting:
- ☆25Mar 13, 2026Updated last week
- Large Visual Language Model(LVLM), Large Language Model(LLM), Multimodal Large Language Model(MLLM), Alignment, Agent, AI System, Survey☆21Jul 27, 2025Updated 7 months ago
- [Findings of ACL'2023] Improving Contrastive Learning of Sentence Embeddings from AI Feedback☆40Aug 14, 2023Updated 2 years ago
- Official repository for "Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition" [ICCV 2023]☆101Apr 30, 2024Updated last year
- [ACCV 2024] ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes 🚀🚀🚀☆37Jan 21, 2025Updated last year
- Official code repository of paper titled "Test-Time Low Rank Adaptation via Confidence Maximization for Zero-Shot Generalization of Visio…☆34May 11, 2025Updated 10 months ago
- Code for paper "Cross-Domain Slot Filling as Machine Reading Comprehension" in IJCAI 2021☆11Aug 24, 2021Updated 4 years ago
- Code for Open3DTrack: Towards Open-Vocabulary 3D Multi-Object Tracking☆33Mar 14, 2025Updated last year
- A Novel Semantic Segmentation Network using Enhanced Boundaries in Cluttered Scenes (WACV 2025)☆12Aug 11, 2025Updated 7 months ago
- ☆10Oct 28, 2019Updated 6 years ago
- Source code of our EMNLP 2022 paper: Co-guiding Net: Achieving Mutual Guidances between Multiple Intent Detection and Slot Filling via He…☆12Nov 14, 2022Updated 3 years ago
- [ECCV'22] Official repository of paper titled "Class-agnostic Object Detection with Multi-modal Transformer".☆315May 9, 2023Updated 2 years ago
- DSTC9 Submission☆16Apr 12, 2021Updated 4 years ago
- DDAM-PS: Diligent Domain Adaptive Mixer for Person Search -- WACV2024☆13Feb 28, 2024Updated 2 years ago
- Official repository of paper titled "UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalitie…☆161Jan 19, 2026Updated 2 months ago
- ☆70Jul 2, 2025Updated 8 months ago
- ☆11May 16, 2025Updated 10 months ago
- This is a repo listing some must-read papers on *AI-driven MOOCs* or *Intelligent Education* published in recent years, mainly contribute…☆16Jun 8, 2022Updated 3 years ago
- Code for "SLIM: Explicit Slot-Intent Mapping with BERT for Joint Multi-Intent Detection and Slot Filling"☆18Nov 22, 2022Updated 3 years ago
- A curated list of vision-and-language pre-training (VLP). :-)☆62Jul 6, 2022Updated 3 years ago
- A Hybrid Change Encoder for Remote Sensing Change Detection (IGARSS 2024)☆18Jun 26, 2024Updated last year
- A Few-Shot Learning based Approach to Multimodal Social Relation Extraction☆14Jan 17, 2023Updated 3 years ago
- A Survey on Benchmarks of Multimodal Large Language Models☆150Jul 1, 2025Updated 8 months ago
- [ICLR 2023] Code for our paper "Selective Annotation Makes Language Models Better Few-Shot Learners"☆109Jul 15, 2023Updated 2 years ago
- ☆35Jan 9, 2025Updated last year
- [CVPR 2023] Official repository of paper titled "Fine-tuned CLIP models are efficient video learners".☆303Apr 3, 2024Updated last year
- Official implementation for the paper "Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation", publish…☆20Jun 3, 2024Updated last year
- The dataset and code for PeerSum at EMNLP'23.☆16Oct 20, 2025Updated 5 months ago
- Validating image classification benchmark results on ViTs and ResNets (v2)☆13Nov 3, 2022Updated 3 years ago
- A video captioning tool using S2VT method and attention mechanism (TensorFlow)☆15Oct 14, 2018Updated 7 years ago
- [InterSpeech 2024] Official code repository of paper titled "Bird Whisperer: Leveraging Large Pre-trained Acoustic Model for Bird Call Cl…☆38Dec 11, 2024Updated last year
- 知识图谱基础设施☆11Jul 25, 2022Updated 3 years ago
- ☆11Oct 29, 2024Updated last year
- ☆27Sep 3, 2024Updated last year
- [EMNLP 2024] Official repository for paper "From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis"☆21Oct 15, 2024Updated last year
- Implementation for NeurIPS 2024 paper "SAFE: Slow and Fast Parameter-Efficient Tuning for Continual Learning with Pre-Trained Models" (ht…☆14Dec 23, 2024Updated last year
- Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors"☆26Jun 15, 2022Updated 3 years ago
- ☆53Sep 13, 2023Updated 2 years ago
- ☆17Nov 18, 2024Updated last year