ys-zong / awesome-self-supervised-multimodal-learningView external linksLinks
[T-PAMI] A curated list of self-supervised multimodal learning resources.
☆275Aug 16, 2024Updated last year
Alternatives and similar repositories for awesome-self-supervised-multimodal-learning
Users that are interested in awesome-self-supervised-multimodal-learning are comparing it to the libraries listed below
Sorting:
- [ICML 2024] Fool Your (Vision and) Language Model With Embarrassingly Simple Permutations☆15Oct 28, 2023Updated 2 years ago
- ☆547Nov 7, 2024Updated last year
- This is the official code for NeurIPS 2023 paper "Learning Unseen Modality Interaction"☆18Jan 22, 2024Updated 2 years ago
- Benchmarking Multi-Image Understanding in Vision and Language Models☆12Jul 29, 2024Updated last year
- ☆13May 12, 2025Updated 9 months ago
- [WSDM 2024 Oral] This is our Pytorch implementation for the paper: "Intent Contrastive Learning with Cross Subsequences for Sequential Re…☆39Jan 7, 2024Updated 2 years ago
- List of papers that combine self-supervision and continual learning☆76Mar 12, 2025Updated 11 months ago
- [MIR-2023-Survey] A continuously updated paper list for multi-modal pre-trained big models☆290Jul 18, 2025Updated 6 months ago
- Reading list for research topics in multimodal machine learning☆6,814Aug 20, 2024Updated last year
- Multi-domain Recommendation with Adapter Tuning☆34Mar 21, 2024Updated last year
- (WACV'24) Kaizen: Practical self-supervised continual learning with continual fine-tuning☆16Oct 29, 2024Updated last year
- Code for the paper: "No Zero-Shot Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance" [NeurI…☆94Apr 29, 2024Updated last year
- ☆17Aug 7, 2024Updated last year
- ☆18Oct 28, 2025Updated 3 months ago
- The repo for "MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance", ICML 2024☆54Jun 28, 2024Updated last year
- [AAAI26] LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs☆52Dec 7, 2025Updated 2 months ago
- Latest Advances on Multimodal Large Language Models☆17,337Feb 7, 2026Updated last week
- [ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models☆155Apr 30, 2024Updated last year
- ☆20Apr 27, 2023Updated 2 years ago
- A curated list of prompt-based paper in computer vision and vision-language learning.☆928Dec 18, 2023Updated 2 years ago
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆52Oct 19, 2024Updated last year
- [ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"☆247Jan 17, 2024Updated 2 years ago
- [CVPR 2024] Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities☆101Mar 13, 2024Updated last year
- A Survey on multimodal learning research.☆333Aug 22, 2023Updated 2 years ago
- This repository lists related work using MVC methods for applications.☆18Dec 14, 2023Updated 2 years ago
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]☆21Feb 27, 2025Updated 11 months ago
- [Survey] Masked Modeling for Self-supervised Representation Learning on Vision and Beyond (https://arxiv.org/abs/2401.00897)☆353Apr 23, 2025Updated 9 months ago
- The efficient tuning method for VLMs☆80Mar 10, 2024Updated last year
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆62Nov 5, 2024Updated last year
- Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning☆169Sep 26, 2022Updated 3 years ago
- This repo lists relevant papers summarized in our survey paper: A Systematic Survey of Prompt Engineering on Vision-Language Foundation …☆510Mar 18, 2025Updated 10 months ago
- [AAAI 2023 Oral] Official pytorch implementation of "Towards Good Practices for Missing Modality Robust Action Recognition"☆23Dec 1, 2022Updated 3 years ago
- Official Implementation of Graph Mixer Networks☆20Dec 5, 2023Updated 2 years ago
- Mamba R1 represents a novel architecture that combines the efficiency of Mamba's state space models with the scalability of Mixture of Ex…☆25Oct 13, 2025Updated 4 months ago
- [Pattern Recognition 25] CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks☆462Mar 1, 2025Updated 11 months ago
- ☆643Feb 15, 2024Updated 2 years ago
- This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have b…☆84Jun 16, 2025Updated 8 months ago
- [NeurIPS 2025] The official repository of "Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tun…☆40Feb 20, 2025Updated 11 months ago
- Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding☆39Mar 16, 2025Updated 10 months ago