Zi-hao-Wei/Efficient-Vision-Language-Pre-training-by-Cluster-Masking

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Zi-hao-Wei/Efficient-Vision-Language-Pre-training-by-Cluster-Masking)

Zi-hao-Wei / Efficient-Vision-Language-Pre-training-by-Cluster-Masking

[CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.

☆33

Alternatives and similar repositories for Efficient-Vision-Language-Pre-training-by-Cluster-Masking

Users that are interested in Efficient-Vision-Language-Pre-training-by-Cluster-Masking are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ThinamXx / cuda-mode
View on GitHub
Making of cuda kernel
☆17May 27, 2025Updated last year
WangFei-2019 / SNARE
View on GitHub
Project for SNARE benchmark
☆11Jun 5, 2024Updated 2 years ago
hellomuffin / exif-as-language
View on GitHub
official repo for the paper "EXIF as Language: Learning Cross-Modal Associations Between Images and Camera Metadata"
☆53Nov 3, 2023Updated 2 years ago
ChenyuHeidiZhang / VL-commonsense
View on GitHub
☆14May 23, 2022Updated 4 years ago
UCSC-VLAA / CLIPS
View on GitHub
An Enhanced CLIP Framework for Learning with Synthetic Captions
☆40Apr 18, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
uvavision / SyViC
View on GitHub
[ICCV 2023] Going Beyond Nouns With Vision & Language Models Using Synthetic Data
☆13Sep 30, 2023Updated 2 years ago
omipan / svl_adapter
View on GitHub
SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained Models
☆21Jan 11, 2024Updated 2 years ago
amitakamath / vl_text_encoders_are_bottlenecks
View on GitHub
Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!
☆11May 24, 2023Updated 3 years ago
ytaek-oh / vl_compo
View on GitHub
☆10Jul 5, 2024Updated 2 years ago
jonkahana / CLIPPR
View on GitHub
An official PyTorch implementation for CLIPPR
☆31Jul 22, 2023Updated 3 years ago
yifeisu / TG-GAT
View on GitHub
Target-Grounded Graph-Aware Transformer for Aerial Vision-and-Dialog Navigation, AVDN Challenge, ICCV CLVL 2023.
☆21Jan 2, 2024Updated 2 years ago
Tanveer81 / RGNet
View on GitHub
This is the official implementation of RGNet: A Unified Retrieval and Grounding Network for Long Videos
☆20Mar 3, 2025Updated last year
Victorwz / VaLM
View on GitHub
VaLM: Visually-augmented Language Modeling. ICLR 2023.
☆56Mar 6, 2023Updated 3 years ago
GraphPKU / CoI
View on GitHub
Chain of Images for Intuitively Reasoning
☆10Nov 29, 2023Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
WongiPark0628 / RAL
View on GitHub
[ICCVW'23] Robust Asymmetric Loss for Multi-Label Long-Tailed Learning
☆19Oct 3, 2023Updated 2 years ago
lxasqjc / MCPL
View on GitHub
MCPL: MULTI-CONCEPT PROMPT LEARNING
☆20May 27, 2024Updated 2 years ago
linzhiqiu / visual_gpt_score
View on GitHub
VisualGPTScore for visio-linguistic reasoning
☆27Oct 7, 2023Updated 2 years ago
HanSolo9682 / CounterCurate
View on GitHub
This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.
☆19Jun 27, 2024Updated 2 years ago
haoyu-bu / CAFe
View on GitHub
Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"
☆33Mar 26, 2025Updated last year
QUVA-Lab / PIN
View on GitHub
Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
☆26Jan 14, 2025Updated last year
Hritikbansal / videocon
View on GitHub
☆58Apr 24, 2024Updated 2 years ago
alipay / POA
View on GitHub
☆22Aug 8, 2024Updated last year
VisionXLab / ProCLIP
View on GitHub
Official PyTorch implementation of ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder
☆25Dec 4, 2025Updated 7 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
naver / unic
View on GitHub
PyTorch code and pretrained weights for the UNIC models.
☆45Aug 29, 2024Updated last year
jiyounglee-0523 / VisAlign
View on GitHub
☆20Apr 23, 2024Updated 2 years ago
ashshaksharifdeen / O-TPT
View on GitHub
CVPR'25 official code for O-TPT: Orthogonality Constraints for Calibrating Test-time Prompt Tuning in Vision-Language Models
☆16Sep 19, 2025Updated 10 months ago
modestyachts / ImageNetV2_pytorch
View on GitHub
ImageNetV2 Pytorch Dataset
☆44Apr 17, 2023Updated 3 years ago
BatsResearch / ex2
View on GitHub
If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions
☆17Apr 4, 2024Updated 2 years ago
jiquan123 / TIER
View on GitHub
TIER: Text-Image Encoder-based Regression for AIGC Image Quality Assessment
☆10Mar 1, 2025Updated last year
mayug / 0-shot-llm-vision
View on GitHub
This repository contains the code for our CVPR 2024 paper,
☆16Aug 27, 2024Updated last year
elisakreiss / concadia
View on GitHub
☆16Jan 3, 2023Updated 3 years ago
AlonMendelson / SGVL
View on GitHub
☆17Dec 13, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
microsoft / A-CLIP
View on GitHub
Official Implementation of Attentive Mask CLIP (ICCV2023, https://arxiv.org/abs/2212.08653)
☆37May 29, 2024Updated 2 years ago
facebookresearch / SIEVE
View on GitHub
SIEVE: Multimodal Dataset Pruning using Image-Captioning Models (CVPR 2024)
☆21Apr 28, 2024Updated 2 years ago
ytaek-oh / fsc-clip
View on GitHub
[EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality
☆24Oct 8, 2024Updated last year
advanc3dUA / WohnungSuchen
View on GitHub
🏠🔍 Auto check for new apartments in Hamburg from various real estate provides
☆16Apr 15, 2026Updated 3 months ago
HKUST-LongGroup / CoMM
View on GitHub
[CVPR 2025 Highlight] Official repository for CoMM Dataset
☆56Dec 31, 2024Updated last year
XMUDeepLIT / LLaVE
View on GitHub
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning
☆78May 23, 2025Updated last year
Becomebright / MTV
View on GitHub
Revisiting Multi-Task Visual Representation Learning
☆22Jan 21, 2026Updated 6 months ago