facebookresearch/diht

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/facebookresearch/diht)

facebookresearch / diht

Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training

☆141

Alternatives and similar repositories for diht

Users that are interested in diht are comparing it to the libraries listed below

Sorting:

facebookresearch / CiT
View on GitHub
Code for the paper titled "CiT Curation in Training for Effective Vision-Language Data".
☆78Jan 18, 2023Updated 3 years ago
ylsung / VL_adapter
View on GitHub
PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)
☆209Dec 18, 2022Updated 3 years ago
mertyg / vision-language-models-are-bows
View on GitHub
Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR …
☆292Jun 7, 2023Updated 2 years ago
ExplainableML / Vision_by_Language
View on GitHub
[ICLR 2024] Official repository for "Vision-by-Language for Training-Free Compositional Image Retrieval"
☆84Jul 4, 2024Updated last year
facebookresearch / paco
View on GitHub
This repo contains documentation and code needed to use PACO dataset: data loaders and training and evaluation scripts for objects, parts…
☆292Feb 12, 2024Updated 2 years ago
mlfoundations / clip_quality_not_quantity
View on GitHub
☆29Oct 18, 2022Updated 3 years ago
arijitray1993 / COLA
View on GitHub
COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!
☆25Nov 23, 2024Updated last year
youngkyunJang / VDG
View on GitHub
Visual Delta Generator with Large Multi-modal Model for Semi-supervised Composed Image Retrieval - CVPR2024
☆21May 30, 2024Updated last year
allenai / close
View on GitHub
☆59Aug 30, 2023Updated 2 years ago
naver-ai / seit
View on GitHub
[ECCV2024][ICCV2023] Official PyTorch implementation of SeiT++ and SeiT
☆56Aug 12, 2024Updated last year
FreedomIntelligence / ALLaVA
View on GitHub
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
☆281Jun 25, 2024Updated last year
facebookresearch / SLIP
View on GitHub
Code release for SLIP Self-supervision meets Language-Image Pre-training
☆787Feb 9, 2023Updated 3 years ago
mlfoundations / datacomp
View on GitHub
DataComp: In search of the next generation of multimodal datasets
☆772Apr 28, 2025Updated 10 months ago
tsb0601 / MMVP
View on GitHub
☆360Jan 27, 2024Updated 2 years ago
jonkahana / CLIPPR
View on GitHub
An official PyTorch implementation for CLIPPR
☆30Jul 22, 2023Updated 2 years ago
LijieFan / LaCLIP
View on GitHub
[NeurIPS 2023] Text data, code and pre-trained models for paper "Improving CLIP Training with Language Rewrites"
☆289Jan 14, 2024Updated 2 years ago
lezhang7 / Enhance-FineGrained
View on GitHub
[CVPR 2024] Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Fine-grained Understanding
☆55Apr 7, 2025Updated 10 months ago
CASIA-LMC-Lab / Obj2Seq
View on GitHub
Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks (NeurIPS2022)
☆85Nov 2, 2022Updated 3 years ago
kakaobrain / coyo-dataset
View on GitHub
COYO-700M: Large-scale Image-Text Pair Dataset
☆1,252Nov 30, 2022Updated 3 years ago
facebookresearch / dmae_st
View on GitHub
Directed masked autoencoders
☆14Feb 20, 2026Updated 2 weeks ago
Victorwz / VaLM
View on GitHub
VaLM: Visually-augmented Language Modeling. ICLR 2023.
☆56Mar 6, 2023Updated 3 years ago
linzhiqiu / visual_gpt_score
View on GitHub
VisualGPTScore for visio-linguistic reasoning
☆27Oct 7, 2023Updated 2 years ago
miccunifi / CIRCO
View on GitHub
[ICCV 2023] - Composed Image Retrieval on Common Objects in context (CIRCO) dataset
☆86Aug 6, 2025Updated 7 months ago
VITA-Group / AsViT
View on GitHub
[ICLR 2022] "As-ViT: Auto-scaling Vision Transformers without Training" by Wuyang Chen, Wei Huang, Xianzhi Du, Xiaodan Song, Zhangyang Wa…
☆76Feb 21, 2022Updated 4 years ago
TencentARC / pi-Tuning
View on GitHub
Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.
☆33Jul 21, 2023Updated 2 years ago
facebookresearch / SWAG
View on GitHub
Official repository for "Revisiting Weakly Supervised Pre-Training of Visual Perception Models". https://arxiv.org/abs/2201.08371.
☆182Apr 17, 2022Updated 3 years ago
liruiw / Dec-SSL
View on GitHub
Understanding Self-Supervised Learning in a non-IID Setting
☆21Oct 21, 2022Updated 3 years ago
UCSC-VLAA / CLIPA
View on GitHub
[NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"
☆319Jun 3, 2024Updated last year
baaivision / CapsFusion
View on GitHub
[CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale
☆213Feb 27, 2024Updated 2 years ago
facebookresearch / FFCV-SSL
View on GitHub
FFCV-SSL Fast Forward Computer Vision for Self-Supervised Learning.
☆212Aug 1, 2023Updated 2 years ago
microsoft / FIBER
View on GitHub
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
☆131Oct 10, 2023Updated 2 years ago
JIA-Lab-research / Parametric-Contrastive-Learning
View on GitHub
Parametric Contrastive Learning (ICCV2021) & GPaCo (TPAMI 2023)
☆259Jul 21, 2025Updated 7 months ago
facebookresearch / visual-counterfactuals
View on GitHub
Making Heads or Tails Towards Semantically Consistent Visual Counterfactuals
☆30Aug 14, 2022Updated 3 years ago
zengyan-97 / X-VLM
View on GitHub
X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)
☆493Nov 25, 2022Updated 3 years ago
microsoft / XPretrain
View on GitHub
Multi-modality pre-training
☆510May 8, 2024Updated last year
princetonvisualai / icons
View on GitHub
☆23Apr 24, 2025Updated 10 months ago
RAIVNLab / CREPE
View on GitHub
[CVPR23 Highlight] CREPE: Can Vision-Language Foundation Models Reason Compositionally?
☆35Apr 27, 2023Updated 2 years ago
ant-research / DreamLIP
View on GitHub
[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions
☆138May 8, 2025Updated 9 months ago
facebookresearch / MetaCLIP
View on GitHub
NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024
☆1,815Nov 27, 2025Updated 3 months ago