ChenDelong1999 / MEP-3MLinks
š A Large-scale Multi-modal E-Commerce Products Dataset (LTDL@IJCAI-21 Best Dataset & Pattern Recognition 2023)
ā34Updated last year
Alternatives and similar repositories for MEP-3M
Users that are interested in MEP-3M are comparing it to the libraries listed below
Sorting:
- Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Trainingā138Updated 2 years ago
- Code for the Video Similarity Challenge.ā80Updated last year
- [CVPR 2022 - Demo Track] - Effective conditioned and composed image retrieval combining CLIP-based featuresā81Updated 10 months ago
- Use CLIP to represent video for Retrieval Taskā70Updated 4 years ago
- Align and Prompt: Video-and-Language Pre-training with Entity Promptsā188Updated 4 months ago
- CLIP4IDC: CLIP for Image Difference Captioning (AACL 2022)ā34Updated 2 years ago
- All-In-One VLM: Image + Video + Transfer to Other Languages / Domains (TPAMI 2023)ā165Updated last year
- mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)ā229Updated 2 years ago
- Official repository of ICCV 2021 - Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Modelsā120Updated 3 months ago
- (ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioningā35Updated last year
- [ICLR2024] Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Modelā43Updated 8 months ago
- Towards Video Text Visual Question Answering: Benchmark and Baselineā38Updated last year
- TransVCL: Attention-enhanced Video Copy Localization Network with Flexible Supervision [AAAI2023 Oral]]ā55Updated 2 years ago
- 𦩠Visual Instruction Tuning with Polite Flamingo - training multi-modal LLMs to be both clever and polite! (AAAI-24 Oral)ā64Updated last year
- [CVPR 2023 (Highlight)] FAME-ViL: Multi-Tasking V+L Model for Heterogeneous Fashion Tasksā55Updated last year
- ā109Updated 2 years ago
- Official Code of ECCV 2022 paper MS-CLIPā90Updated 3 years ago
- [ECCV 2022] FashionViL: Fashion-Focused V+L Representation Learningā61Updated 2 years ago
- [SIGIR 2022] CenterCLIP: Token Clustering for Efficient Text-Video Retrieval. Also, a text-video retrieval toolbox based on CLIP + fast pā¦ā132Updated 3 years ago
- Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training (ACL 2023))ā91Updated 2 years ago
- ā133Updated last year
- Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backboneā129Updated last year
- ā26Updated 4 years ago
- A Unified Framework for Video-Language Understandingā59Updated 2 years ago
- Product1Mā87Updated 2 years ago
- Official PyTorch implementation of the paper "DisCo-CLIP: A Distributed Contrastive Loss for Memory Efficient CLIP Training".ā58Updated 2 years ago
- Reproducible scaling laws for contrastive language-image learning (https://arxiv.org/abs/2212.07143)ā176Updated 2 months ago
- Lion: Kindling Vision Intelligence within Large Language Modelsā51Updated last year
- Official implementation for the paper "Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition"ā259Updated last year
- ā87Updated last year