lzcemma / LeMDA
Code Example for Learning Multimodal Data Augmentation in Feature Space
☆41Updated last year
Alternatives and similar repositories for LeMDA:
Users that are interested in LeMDA are comparing it to the libraries listed below
- CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations, ICCV 2021☆59Updated 2 years ago
- ☆26Updated 3 years ago
- MixGen: A New Multi-Modal Data Augmentation☆119Updated 2 years ago
- ☆154Updated 2 years ago
- Official Implementation of "Geometric Multimodal Contrastive Representation Learning" (https://arxiv.org/abs/2202.03390)☆28Updated 3 weeks ago
- [AAAI 2023] Contrastive Masked Autoencoders for Self-Supervised Video Hashing☆26Updated last year
- offical implementation of "Calibrating Multimodal Learning" on ICML 2023☆19Updated last year
- [NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training☆24Updated last year
- CVPR 2022, Robust Contrastive Learning against Noisy Views☆83Updated 3 years ago
- The code for the paper "Hybrid Contrastive Quantization for Efficient Cross-View Video Retrieval" (WWW'22, Oral).☆18Updated 2 years ago
- Compress conventional Vision-Language Pre-training data☆49Updated last year
- SimVLM ---SIMPLE VISUAL LANGUAGE MODEL PRETRAINING WITH WEAK SUPERVISION☆36Updated 2 years ago
- A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval☆42Updated 2 years ago
- Code for Label Propagation for Zero-shot Classification with Vision-Language Models (CVPR2024)☆36Updated 6 months ago
- Multi-label Image Recognition with Partial Labels (IJCV'24, ESWA'24, AAAI'22)☆35Updated 6 months ago
- ☆57Updated last year
- code for "Multitask Vision-Language Prompt Tuning" https://arxiv.org/abs/2211.11720☆55Updated 7 months ago
- 📍 Official pytorch implementation of paper "ProtoCLIP: Prototypical Contrastive Language Image Pretraining" (IEEE TNNLS)☆52Updated last year
- ☆61Updated last year
- Code for EMNLP 2022 paper “Distilled Dual-Encoder Model for Vision-Language Understanding”☆29Updated last year
- An Enhanced CLIP Framework for Learning with Synthetic Captions☆25Updated last month
- [CVPRW22] Official Implementation of T-Food: "Transformer Decoders with MultiModal Regularization for Cross-Modal Food Retrieval". Accept…☆30Updated 2 years ago
- Code and results accompanying our paper titled CHiLS: Zero-Shot Image Classification with Hierarchical Label Sets☆55Updated last year
- Official implementation of "Everything at Once - Multi-modal Fusion Transformer for Video Retrieval". CVPR 2022☆98Updated 2 years ago
- Repository for the paper: Teaching Structured Vision & Language Concepts to Vision & Language Models☆46Updated last year
- [NeurIPS 2023] Factorized Contrastive Learning: Going Beyond Multi-view Redundancy☆63Updated last year
- A curated list of vision-and-language pre-training (VLP). :-)☆56Updated 2 years ago
- Official Code Release for "Diagnosing and Rectifying Vision Models using Language" (ICLR 2023)☆32Updated last year
- [TMLR 2022] High-Modality Multimodal Transformer☆110Updated 2 months ago
- This repository is an implementation for the loss function proposed in https://arxiv.org/pdf/2110.06848.pdf.☆111Updated 3 years ago