HaoYang0123 / Creative_Generation_Pipeline
☆24Updated last year
Related projects ⓘ
Alternatives and complementary repositories for Creative_Generation_Pipeline
- Official Code for the ICCV23 Paper: "LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Sparse Retrieval…☆41Updated last year
- Product1M☆86Updated 2 years ago
- The official code for paper "EasyGen: Easing Multimodal Generation with a Bidirectional Conditional Diffusion Model and LLMs"☆73Updated 9 months ago
- Research Code for Multimodal-Cognition Team in Ant Group☆123Updated 4 months ago
- Narrative movie understanding benchmark☆59Updated 6 months ago
- Bling's Object detection tool☆56Updated last year
- Offical PyTorch implementation of Clover: Towards A Unified Video-Language Alignment and Fusion Model (CVPR2023)☆40Updated last year
- [CVPR 2023] VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval☆38Updated last year
- Diffusion Models for Generative Outfit Recommendation☆20Updated 2 months ago
- TagGPT: Large Language Models are Zero-shot Multimodal Taggers☆61Updated last year
- Multi-domain Recommendation with Adapter Tuning☆24Updated 8 months ago
- IISAN: Efficiently Adapting Multimodal Representation for Sequential Recommendation with Decoupled PEFT☆23Updated 3 months ago
- ☆157Updated last year
- code for TCL: Vision-Language Pre-Training with Triple Contrastive Learning, CVPR 2022☆260Updated last month
- [ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives☆16Updated 3 weeks ago
- ☆43Updated 2 years ago
- [ICLR 2023] This is the code repo for our ICLR‘23 paper "Universal Vision-Language Dense Retrieval: Learning A Unified Representation Spa…☆48Updated 4 months ago
- The implementations of various baselines in our CIKM 2022 paper: ChiQA: A Large Scale Image-based Real-World Question Answering Dataset f…☆30Updated 6 months ago
- [ECCV 2022] FashionViL: Fashion-Focused V+L Representation Learning☆58Updated 2 years ago
- pytorch implementation of mvp: a multi-stage vision-language pre-training framework☆33Updated last year
- MoCLE (First MLLM with MoE for instruction customization and generalization!) (https://arxiv.org/abs/2312.12379)☆29Updated 7 months ago
- ☆43Updated last year
- This repo contains codes and instructions for baselines in the VLUE benchmark.☆41Updated 2 years ago
- [NeurIPS 2023] Text data, code and pre-trained models for paper "Improving CLIP Training with Language Rewrites"☆260Updated 10 months ago
- The simple demo of `Unified Vision-Language Representation Modeling for E-Commerce Same-Style Products Retrieval`☆11Updated last year
- Evaluation code and datasets for the ACL 2024 paper, VISTA: Visualized Text Embedding for Universal Multi-Modal Retrieval. The original c…☆20Updated this week
- (ICML 2024) Improve Context Understanding in Multimodal Large Language Models via Multimodal Composition Learning☆22Updated last month
- LMM which strictly superset LLM embedded☆30Updated 2 weeks ago
- ☆85Updated 11 months ago
- ☆17Updated 3 months ago