marslanm / Multimodality-Representation-LearningLinks

This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl.acm.org/doi/abs/10.1145/3617833 .

☆77

Alternatives and similar repositories for Multimodality-Representation-Learning

Users that are interested in Multimodality-Representation-Learning are comparing it to the libraries listed below

Sorting:

zhjohnchan / awesome-vision-and-language-pretraining
A curated list of vision-and-language pre-training (VLP). :-)
☆59Updated 3 years ago
gokulkarthik / hateclipper
Hate-CLIPper: Multimodal Hateful Meme Classification with Explicit Cross-modal Interaction of CLIP features - Accepted at EMNLP 2022 Work…
☆53Updated 3 months ago
Yuco-Z / Awesome-Multi-Modal-Dialog
[Paperlist] Awesome paper list of multimodal dialog, including methods, datasets and metrics
☆38Updated 6 months ago
drmuskangarg / Multimodal-datasets
This repository is build in association with our position paper on "Multimodality for NLP-Centered Applications: Resources, Advances and …
☆304Updated 3 years ago
vincentlux / Awesome-Multimodal-LLM
Reading list for Multimodal Large Language Models
☆68Updated last year
PLUM-Lab / MultiInstruct
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning
☆135Updated 2 years ago
DmitryRyumin / EMNLP-2023-Papers
EMNLP 2023 Papers: Explore cutting-edge research from EMNLP 2023, the premier conference for advancing empirical methods in natural langu…
☆109Updated last year
OpenMatch / UniVL-DR
[ICLR 2023] This is the code repo for our ICLR‘23 paper "Universal Vision-Language Dense Retrieval: Learning A Unified Representation Spa…
☆51Updated last year
YiyangZhou / LURE
[ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
☆147Updated last year
pliang279 / MultiViz
[ICLR 2023] MultiViz: Towards Visualizing and Understanding Multimodal Models
☆96Updated 11 months ago
YulongBonjour / SimVLM
SimVLM ---SIMPLE VISUAL LANGUAGE MODEL PRETRAINING WITH WEAK SUPERVISION
☆36Updated 2 years ago
zjukg / Structure-CLIP
[Paper][AAAI2024]Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations
☆145Updated last year
Wusiwei0410 / SciMMIR
☆22Updated last year
morningmoni / UniPELT
Code for paper "UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning", ACL 2022
☆62Updated 3 years ago
guilk / KAT
Research code for "KAT: A Knowledge Augmented Transformer for Vision-and-Language"
☆65Updated 3 years ago
zengyan-97 / X2-VLM
All-In-One VLM: Image + Video + Transfer to Other Languages / Domains (TPAMI 2023)
☆165Updated 11 months ago
waltonfuture / InstructionGPT-4
InstructionGPT-4
☆39Updated last year
ilkerkesen / frozen
A PyTorch implementation of Multimodal Few-Shot Learning with Frozen Language Models with OPT.
☆43Updated 3 years ago
X-PLUG / mPLUG-HalOwl
mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating
☆96Updated last year
PaulLerner / ViQuAE
Source code and data used in the papers ViQuAE (Lerner et al., SIGIR'22), Multimodal ICT (Lerner et al., ECIR'23) and Cross-modal Retriev…
☆38Updated 7 months ago
edchengg / infoseek_eval
EMNLP2023 - InfoSeek: A New VQA Benchmark focus on Visual Info-Seeking Questions
☆25Updated last year
TIGER-AI-Lab / UniIR
Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)
☆156Updated 10 months ago
miguelsvasco / gmc
Official Implementation of "Geometric Multimodal Contrastive Representation Learning" (https://arxiv.org/abs/2202.03390)
☆28Updated 7 months ago
yuezih / less-is-more
Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)
☆55Updated 9 months ago
ShiZhengyan / DePT
[ICLR 2024] This is the repository for the paper titled "DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning"
☆95Updated last year
declare-lab / MM-InstructEval
This repository contains code to evaluate various multimodal large language models using different instructions across multiple multimoda…
☆29Updated 4 months ago
MichaelZhouwang / VLUE
This repo contains codes and instructions for baselines in the VLUE benchmark.
☆41Updated 3 years ago
gyhdog99 / MoCLE
MoCLE (First MLLM with MoE for instruction customization and generalization!) (https://arxiv.org/abs/2312.12379)
☆41Updated last month
junyangwang0410 / HaELM
An automatic MLLM hallucination detection framework
☆19Updated last year
YiyangZhou / POVID
[Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
☆86Updated last year