GewelsJI / MVLT
Masked Vision-Language Transformer in Fashion
☆32Updated 11 months ago
Related projects: ⓘ
- Code and Models for "GeneCIS A Benchmark for General Conditional Image Similarity"☆54Updated last year
- ☆32Updated 8 months ago
- ☆29Updated last year
- ☆52Updated last year
- [ECCV 2024] This is the official implementation of "Stitched ViTs are Flexible Vision Backbones".☆22Updated 7 months ago
- Official implementation of TagAlign☆31Updated 5 months ago
- Official Pytorch implementation of "CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion" (TMLR 2024)☆73Updated last month
- REVO-LION: Evaluating and Refining Vision-Language Instruction Tuning Datasets☆11Updated 11 months ago
- HIRL: A General Framework for Hierarchical Image Representation Learning (http://arxiv.org/abs/2205.13159)☆40Updated 2 years ago
- [FGVC9-CVPR 2022] The second place solution for 2nd eBay eProduct Visual Search Challenge.☆26Updated 2 years ago
- ☆30Updated 7 months ago
- ☆31Updated 3 months ago
- Turning to Video for Transcript Sorting☆44Updated last year
- [ICLR 23] Contrastive Aligned of Vision to Language Through Parameter-Efficient Transfer Learning☆36Updated last year
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆58Updated 2 weeks ago
- ☆24Updated last year
- A curated list of papers and resources for text-to-image evaluation.☆26Updated last year
- A Python toolkit for the OmniLabel benchmark providing code for evaluation and visualization☆21Updated last month
- Official implementation of Data-Free Sketch-Based Image Retrieval, CVPR 2023.☆24Updated last year
- Offical PyTorch implementation of Clover: Towards A Unified Video-Language Alignment and Fusion Model (CVPR2023)☆40Updated last year
- Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training☆15Updated last year
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"☆30Updated last month
- ☆44Updated last year
- ☆20Updated 9 months ago
- Visual Instruction-guided Explainable Metric. Code for "Towards Explainable Metrics for Conditional Image Synthesis Evaluation" (ACL 2024…☆22Updated last month
- [CVPR-2023] The official dataset of Advancing Visual Grounding with Scene Knowledge: Benchmark and Method.☆28Updated last year
- FuseCap: Large Language Model for Visual Data Fusion in Enriched Caption Generation☆47Updated 5 months ago
- [ECCV 2022] This repository includes the official implementation our paper "In Defense of Image Pre-Training for Spatiotemporal Recogniti…☆19Updated last year
- Code for CVPR 2023 paper "SViTT: Temporal Learning of Sparse Video-Text Transformers"☆16Updated last year