This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl.acm.org/doi/abs/10.1145/3617833 .
☆84Jun 16, 2025Updated 8 months ago
Alternatives and similar repositories for Multimodality-Representation-Learning
Users that are interested in Multimodality-Representation-Learning are comparing it to the libraries listed below
Sorting:
- ☆25Apr 30, 2024Updated last year
- [Findings of ACL'2023] Improving Contrastive Learning of Sentence Embeddings from AI Feedback☆40Aug 14, 2023Updated 2 years ago
- Code for our project CROWN (Conversational Passage Ranking by Reasoning over Word Networks)☆10Jan 11, 2024Updated 2 years ago
- ☆10Oct 28, 2019Updated 6 years ago
- Official code repository of paper titled "Test-Time Low Rank Adaptation via Confidence Maximization for Zero-Shot Generalization of Visio…☆32May 11, 2025Updated 9 months ago
- Source code of our EMNLP 2022 paper: Co-guiding Net: Achieving Mutual Guidances between Multiple Intent Detection and Slot Filling via He…☆12Nov 14, 2022Updated 3 years ago
- [ACCV 2024] ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes 🚀🚀🚀☆37Jan 21, 2025Updated last year
- Official repository for "Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition" [ICCV 2023]☆101Apr 30, 2024Updated last year
- DSTC9 Submission☆16Apr 12, 2021Updated 4 years ago
- [AAAI 2024] DenoSent: A Denoising Objective for Self-Supervised Sentence Representation Learning☆15Apr 29, 2024Updated last year
- [ICLR 2023] Code for our paper "Selective Annotation Makes Language Models Better Few-Shot Learners"☆109Jul 15, 2023Updated 2 years ago
- ☆16Oct 21, 2024Updated last year
- This is a repo listing some must-read papers on *AI-driven MOOCs* or *Intelligent Education* published in recent years, mainly contribute…☆16Jun 8, 2022Updated 3 years ago
- A Few-Shot Learning based Approach to Multimodal Social Relation Extraction☆14Jan 17, 2023Updated 3 years ago
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆20Oct 17, 2024Updated last year
- Official repository for ICML 2024 paper "MoRe Fine-Tuning with 10x Fewer Parameters"☆22Oct 14, 2025Updated 4 months ago
- Code for "SLIM: Explicit Slot-Intent Mapping with BERT for Joint Multi-Intent Detection and Slot Filling"☆18Nov 22, 2022Updated 3 years ago
- ☆17Nov 3, 2024Updated last year
- A video captioning tool using S2VT method and attention mechanism (TensorFlow)☆15Oct 14, 2018Updated 7 years ago
- [ACL 2024] A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset☆25May 29, 2025Updated 9 months ago
- [WIP@Oct 13] 质衡-基准测试 (Q-Bench in Chinese),包含中文版【底层视觉问答】和【底层视觉描述】数据集,以及中文提示下的图片质量评价。 We will release Q-Bench in more languages in the futu…☆24Jan 7, 2024Updated 2 years ago
- Implementation of "Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation"☆21Jul 31, 2023Updated 2 years ago
- Touchstone: Evaluating Vision-Language Models by Language Models☆83Jan 18, 2024Updated 2 years ago
- ☆21Sep 5, 2023Updated 2 years ago
- Official implementation for the paper "Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation", publish…☆20Jun 3, 2024Updated last year
- ☆39Jun 25, 2025Updated 8 months ago
- ☆21Jul 25, 2025Updated 7 months ago
- ☆21Aug 19, 2024Updated last year
- Joint learning of images and text via maximization of mutual information☆19Dec 14, 2021Updated 4 years ago
- ☆53Sep 13, 2023Updated 2 years ago
- [NeurIPS 2022] Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings☆22Jan 30, 2023Updated 3 years ago
- ROUGE for multilingual Summarization☆25Oct 11, 2021Updated 4 years ago
- ☆22Aug 1, 2021Updated 4 years ago
- FusedChat is a dialogue dataset. It contains dialogue sessions fusing task-oriented dialogues and open-domain dialogues.☆29Jul 20, 2022Updated 3 years ago
- This is the code for the Submission 3358 at NeurIPS 2022.☆22Dec 21, 2022Updated 3 years ago
- Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive Training☆30Jun 20, 2023Updated 2 years ago
- A curated list of vision-and-language pre-training (VLP). :-)☆62Jul 6, 2022Updated 3 years ago
- PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task Completion☆59Feb 29, 2024Updated 2 years ago
- SUPERVAIZER is a toolkit built for the age of AI interoperability. At its core, it implements Google's Agent-to-Agent (A2A) protocol, ena…☆14Feb 4, 2026Updated last month