kakaobrain / coyo-align
ALIGN trained on COYO-dataset
☆29Updated 11 months ago
Alternatives and similar repositories for coyo-align:
Users that are interested in coyo-align are comparing it to the libraries listed below
- ViT trained on COYO-Labeled-300M dataset☆32Updated 2 years ago
- [ECCV2024][ICCV2023] Official PyTorch implementation of SeiT++ and SeiT☆55Updated 7 months ago
- ☆46Updated 11 months ago
- ☆64Updated last year
- [BMVC22] Official Implementation of ViCHA: "Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment"☆55Updated 2 years ago
- A huge dataset for Document Visual Question Answering☆15Updated 8 months ago
- This is an official implementation of GRIT-VLP☆21Updated 2 years ago
- Repository for the paper "Data Efficient Masked Language Modeling for Vision and Language".☆18Updated 3 years ago
- Code and Models for "GeneCIS A Benchmark for General Conditional Image Similarity"☆56Updated last year
- Extended COCO Validation (ECCV) Caption dataset (ECCV 2022)☆56Updated last year
- Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training☆136Updated 2 years ago
- ☆50Updated 2 years ago
- FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions☆55Updated 11 months ago
- ☆88Updated last year
- ☆30Updated 2 years ago
- Matryoshka Multimodal Models☆98Updated 2 months ago
- https://arxiv.org/abs/2209.15162☆49Updated 2 years ago
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆64Updated 6 months ago
- clip retrieval benchmark☆17Updated 2 years ago
- Official repository for the General Robust Image Task (GRIT) Benchmark☆53Updated 2 years ago
- Official Pytorch implementation of "CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion" (TMLR 2024)☆83Updated last month
- Research code for "Training Vision-Language Transformers from Captions Alone"☆34Updated 2 years ago
- RO-ViT CVPR 2023 "Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers"☆18Updated last year
- ImageNet-12k subset of ImageNet-21k (fall11)☆21Updated last year
- Code release for the CVPR'23 paper titled "PartDistillation Learning part from Instance Segmentation"☆58Updated last year
- [FGVC9-CVPR 2022] The second place solution for 2nd eBay eProduct Visual Search Challenge.☆26Updated 2 years ago
- Masked Vision-Language Transformer in Fashion☆33Updated last year
- ☆45Updated 2 weeks ago
- Un-*** 50 billions multimodality dataset☆24Updated 2 years ago
- This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"☆128Updated 9 months ago