mshukor / VLPCookLinks
Official implementation of VLPCook: Vision and Structured-Language Pretraining for Cross-Modal Food Retrieval
☆14Updated 2 years ago
Alternatives and similar repositories for VLPCook
Users that are interested in VLPCook are comparing it to the libraries listed below
Sorting:
- Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021 (Oral)☆160Updated 2 weeks ago
- [CVPR 2022] Cross-Architecture Self-supervised Video Representation Learning☆24Updated 3 years ago
- ☆34Updated 3 years ago
- The official implementation of 'Align and Attend: Multimodal Summarization with Dual Contrastive Losses' (CVPR 2023)☆78Updated 2 years ago
- https://layer6ai-labs.github.io/xpool/☆125Updated 2 years ago
- MixGen: A New Multi-Modal Data Augmentation☆126Updated 2 years ago
- [CVPR 2023] VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval☆38Updated 2 years ago
- ☆27Updated 3 years ago
- Implementation of our CVPR2022 paper, Negative-Aware Attention Framework for Image-Text Matching.☆119Updated 2 years ago
- ☆33Updated 4 years ago
- code for TCL: Vision-Language Pre-Training with Triple Contrastive Learning, CVPR 2022☆265Updated 11 months ago
- CLIP4IDC: CLIP for Image Difference Captioning (AACL 2022)☆34Updated 2 years ago
- Official Pytorch implementation of "Probabilistic Cross-Modal Embedding" (CVPR 2021)☆133Updated last year
- ☆74Updated last year
- ☆16Updated 4 years ago
- Official implementation of "Everything at Once - Multi-modal Fusion Transformer for Video Retrieval." CVPR 2022☆111Updated 3 years ago
- PyTorch implementation of HANet: Hierarchical Alignment Networks for Video-Text Retrieval (ACM MM 2021).☆47Updated 4 years ago
- This is the repository for papr "One-Shot Scene Graph Generation"☆16Updated 3 years ago
- Code for CVPR 2021 paper: Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning☆86Updated 4 years ago
- An unofficial pytorch implementation of "TransVG: End-to-End Visual Grounding with Transformers".☆52Updated 4 years ago
- [AAAI2021] The code of “Similarity Reasoning and Filtration for Image-Text Matching”☆217Updated last year
- ☆20Updated 2 years ago
- [IJCV] AOE-Net: Entities Interactions Modeling with Adaptive Attention Mechanism for Temporal Action Proposals Generation☆20Updated last year
- Official implementation of the Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT) | ICCV 2021 - Image Retrieval o…☆39Updated last year
- CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations, ICCV 2021☆64Updated 3 years ago
- ☆24Updated 3 years ago
- [arXiv22] Disentangled Representation Learning for Text-Video Retrieval☆96Updated 3 years ago
- Official repository for "Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting" [CVPR 2023]☆123Updated 2 years ago
- Code of SSAN☆66Updated last year
- ☆17Updated 3 years ago