ChenDelong1999 / MEP-3MLinks

🎁 A Large-scale Multi-modal E-Commerce Products Dataset (LTDL@IJCAI-21 Best Dataset & Pattern Recognition 2023)

☆36

Alternatives and similar repositories for MEP-3M

Users that are interested in MEP-3M are comparing it to the libraries listed below

Sorting:

Deferf / CLIP_Video_Representation
Use CLIP to represent video for Retrieval Task
☆70Updated 4 years ago
facebookresearch / vsc2022
Code for the Video Similarity Challenge.
☆80Updated last year
zengyan-97 / CCLM
Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training (ACL 2023))
☆92Updated 2 years ago
ABaldrati / CLIP4CirDemo
[CVPR 2022 - Demo Track] - Effective conditioned and composed image retrieval combining CLIP-based features
☆81Updated 11 months ago
X-PLUG / mPLUG-2
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)
☆229Updated 2 years ago
salesforce / ALPRO
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
☆187Updated 5 months ago
zhanxlin / Product1M
Product1M
☆88Updated 3 years ago
facebookresearch / diht
Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training
☆138Updated 2 years ago
mynameischaos / Lion
Lion: Kindling Vision Intelligence within Large Language Models
☆51Updated last year
BrandonHanx / mmf
[ECCV 2022] FashionViL: Fashion-Focused V+L Representation Learning
☆61Updated 2 years ago
transvcl / TransVCL
TransVCL: Attention-enhanced Video Copy Localization Network with Flexible Supervision [AAAI2023 Oral]]
☆56Updated 2 years ago
CryhanFang / CLIP2Video
☆254Updated 2 years ago
ChenDelong1999 / polite-flamingo
🦩 Visual Instruction Tuning with Polite Flamingo - training multi-modal LLMs to be both clever and polite! (AAAI-24 Oral)
☆63Updated last year
X2FD / LVIS-INSTRUCT4V
☆133Updated last year
haofanwang / natural-language-joint-query-search
Search photos on Unsplash based on OpenAI's CLIP model, support search with joint image+text queries and attention visualization.
☆223Updated 4 years ago
zengyan-97 / X2-VLM
All-In-One VLM: Image + Video + Transfer to Other Languages / Domains (TPAMI 2023)
☆165Updated last year
BrandonHanx / FAME-ViL
[CVPR 2023 (Highlight)] FAME-ViL: Multi-Tasking V+L Model for Heterogeneous Fashion Tasks
☆55Updated 2 years ago
Cuberick-Orion / CIRR
Official repository of ICCV 2021 - Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models
☆123Updated last week
scenarios / WeMM
☆87Updated last year
bytedance / VTVQA
Towards Video Text Visual Question Answering: Benchmark and Baseline
☆38Updated last year
mzhaoshuai / CenterCLIP
[SIGIR 2022] CenterCLIP: Token Clustering for Efficient Text-Video Retrieval. Also, a text-video retrieval toolbox based on CLIP + fast p…
☆133Updated 3 years ago
microsoft / LAVENDER
A Unified Framework for Video-Language Understanding
☆60Updated 2 years ago
microsoft / BridgeTower
Open source code for AAAI 2023 Paper "BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning"
☆166Updated 2 years ago
LijieFan / LaCLIP
[NeurIPS 2023] Text data, code and pre-trained models for paper "Improving CLIP Training with Language Rewrites"
☆287Updated last year
zerovl / ZeroVL
[ECCV2022] Contrastive Vision-Language Pre-training with Limited Resources
☆45Updated 3 years ago
OFA-Sys / TouchStone
Touchstone: Evaluating Vision-Language Models by Language Models
☆83Updated last year
TXH-mercury / COSA
[ICLR2024] Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
☆43Updated 10 months ago
lyakaap / ISC21-Descriptor-Track-1st
The 1st Place Solution of the Facebook AI Image Similarity Challenge (ISC21) : Descriptor Track.
☆141Updated last year
klauscc / VindLU
☆110Updated 2 years ago
microsoft / UniTAB
UniTAB: Unifying Text and Box Outputs for Grounded VL Modeling, ECCV 2022 (Oral Presentation)
☆89Updated 2 years ago