kakaobrain / coyo-alignLinks

ALIGN trained on COYO-dataset

☆29

Alternatives and similar repositories for coyo-align

Users that are interested in coyo-align are comparing it to the libraries listed below

Sorting:

gregor-ge / mBLIP
☆87Updated last year
LAION-AI / General-GPT
☆65Updated 2 years ago
kakaobrain / coyo-vit
ViT trained on COYO-Labeled-300M dataset
☆33Updated 2 years ago
RotsteinNoam / FuseCap
FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions
☆55Updated last year
facebookresearch / diht
Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training
☆138Updated 2 years ago
LAION-AI / laion50BU
Un-*** 50 billions multimodality dataset
☆22Updated 3 years ago
ryanwebster90 / snip-dedup
☆103Updated last year
facebookresearch / active_indexing
Official implementation of "Active Image Indexing"
☆59Updated 2 years ago
kakaobrain / noc
☆47Updated last year
mshukor / ViCHA
[BMVC22] Official Implementation of ViCHA: "Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment"
☆55Updated 3 years ago
lucidrains / MaMMUT-pytorch
Implementation of MaMMUT, a simple vision-encoder text-decoder architecture for multimodal tasks from Google, in Pytorch
☆102Updated 2 years ago
naver-ai / seit
[ECCV2024][ICCV2023] Official PyTorch implementation of SeiT++ and SeiT
☆56Updated last year
allenai / grit_official
Official repository for the General Robust Image Task (GRIT) Benchmark
☆54Updated 2 years ago
navervision / CompoDiff
Official Pytorch implementation of "CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion" (TMLR 2024)
☆87Updated 9 months ago
mlfoundations / patching
Patching open-vocabulary models by interpolating weights
☆91Updated 2 years ago
yonatanbitton / data_efficient_masked_language_modeling_for_vision_and_language
Repository for the paper "Data Efficient Masked Language Modeling for Vision and Language".
☆18Updated 4 years ago
joanrod / ocr-vqgan
OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Percept…
☆81Updated 2 years ago
opendatalab / CLIP-Parrot-Bias
ECCV2024_Parrot Captions Teach CLIP to Spot Text
☆65Updated last year
TheoCoombes / ClipCap
Using pretrained encoder and language models to generate captions from multimedia inputs.
☆98Updated 2 years ago
sehyunkwon / ICTC
This is a public repository for Image Clustering Conditioned on Text Criteria (IC|TC)
☆91Updated last year
NVlabs / PALAVRA
☆53Updated 3 years ago
salesforce / MUST
PyTorch code for MUST
☆107Updated 6 months ago
mlfoundations / imagenet-captions
Release of ImageNet-Captions
☆51Updated 2 years ago
MIMICLab / L-Verse
L-Verse: Bidirectional Generation Between Image and Text
☆109Updated 7 months ago
redcaps-dataset / redcaps-downloader
Command-line tool for downloading and extending the RedCaps dataset.
☆50Updated last year
salesforce / LayoutDETR
The official PyTorch implementation for arXiv'23 paper 'LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer'
☆100Updated 6 months ago
weiyx16 / CLIP-pytorch
A non-JIT version implementation / replication of CLIP of OpenAI in pytorch
☆34Updated 4 years ago
google-research-datasets / videoCC-data
VideoCC is a dataset containing (video-URL, caption) pairs for training video-text machine learning models. It is created using an automa…
☆78Updated 2 years ago
allenai / gpv2
☆32Updated 3 years ago
Deferf / CLIP_Video_Representation
Use CLIP to represent video for Retrieval Task
☆70Updated 4 years ago