marslanm / Multimodality-Representation-Learning
This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl.acm.org/doi/abs/10.1145/3617833 .
☆68Updated last year
Related projects ⓘ
Alternatives and complementary repositories for Multimodality-Representation-Learning
- A curated list of vision-and-language pre-training (VLP). :-)☆56Updated 2 years ago
- Official code repository for "Meta Learning to Bridge Vision and Language Models for Multimodal Few-Shot Learning" (published at ICLR 202…☆54Updated last year
- [Paper][AAAI2024]Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations☆113Updated 4 months ago
- Hate-CLIPper: Multimodal Hateful Meme Classification with Explicit Cross-modal Interaction of CLIP features - Accepted at EMNLP 2022 Work…☆42Updated last year
- [ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models☆133Updated 6 months ago
- EMNLP2023 - InfoSeek: A New VQA Benchmark focus on Visual Info-Seeking Questions☆16Updated 5 months ago
- [ICLR 2023] This is the code repo for our ICLR‘23 paper "Universal Vision-Language Dense Retrieval: Learning A Unified Representation Spa…☆43Updated 4 months ago
- ViLLA: Fine-grained vision-language representation learning from real-world data☆39Updated last year
- ☆30Updated last month
- An automatic MLLM hallucination detection framework☆17Updated last year
- This repo contains codes and instructions for baselines in the VLUE benchmark.☆41Updated 2 years ago
- SimVLM ---SIMPLE VISUAL LANGUAGE MODEL PRETRAINING WITH WEAK SUPERVISION☆34Updated 2 years ago
- A PyTorch implementation of Multimodal Few-Shot Learning with Frozen Language Models with OPT.☆43Updated 2 years ago
- Reading list for Multimodal Large Language Models☆65Updated last year
- ☆38Updated last year
- Data for evaluating GPT-4V☆11Updated last year
- SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection☆30Updated 2 months ago
- [ICLR 2023] MultiViz: Towards Visualizing and Understanding Multimodal Models☆90Updated 2 months ago
- ☆53Updated 7 months ago
- MoCLE (First MLLM with MoE for instruction customization and generalization!) (https://arxiv.org/abs/2312.12379)☆29Updated 7 months ago
- Code and results accompanying our paper titled CHiLS: Zero-Shot Image Classification with Hierarchical Label Sets☆54Updated last year
- Implementation of the Benchmark Approaches for Medical Instructional Video Classification (MedVidCL) and Medical Video Question Answering…☆28Updated last year
- Source code and data used in the papers ViQuAE (Lerner et al., SIGIR'22), Multimodal ICT (Lerner et al., ECIR'23) and Cross-modal Retriev…☆26Updated 9 months ago
- Official Implementation of "Geometric Multimodal Contrastive Representation Learning" (https://arxiv.org/abs/2202.03390)☆26Updated 2 years ago
- Awesome List of Vision Language Prompt Papers☆36Updated last year
- [CVPR 2024] FairCLIP: Harnessing Fairness in Vision-Language Learning☆49Updated 3 months ago
- Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning☆123Updated 2 years ago
- InstructionGPT-4☆37Updated 10 months ago
- ☆100Updated 2 years ago
- MixGen: A New Multi-Modal Data Augmentation☆115Updated last year