lezhang7/SAIL

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/lezhang7/SAIL)

lezhang7 / SAIL

[CVPR 2025 Highlight] Official Pytorch codebase for paper: "Assessing and Learning Alignment of Unimodal Vision and Language Models"

☆60

Alternatives and similar repositories for SAIL

Users that are interested in SAIL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

wuw2019 / LoTLIP
View on GitHub
[NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understanding
☆49Jan 14, 2025Updated last year
ant-research / DreamLIP
View on GitHub
[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions
☆138May 8, 2025Updated last year
peterant330 / KUEA
View on GitHub
[ICML'25] Kernel-based Unsupervised Embedding Alignment for Enhanced Visual Representation in Vision-language Models
☆23Sep 7, 2025Updated 10 months ago
ExplainableML / flair
View on GitHub
[CVPR 2025] FLAIR: VLM with Fine-grained Language-informed Image Representations
☆148Mar 12, 2026Updated 4 months ago
rabiulcste / vismin
View on GitHub
[NeurIPS24] VisMin: Visual Minimal-Change Understanding
☆19Mar 3, 2025Updated last year
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
lezhang7 / Retrieval_MuGI
View on GitHub
[EMNLP'2024 Findings] Explore generated documents for enhanced IR with LLMs. We enhance BM25 to surpass strong dense retriever on many da…
☆14Mar 28, 2025Updated last year
deepglint / UniDoc-RL
View on GitHub
UniDoc-RL: Unified Document Understanding with Reinforcement Learning
☆16May 21, 2026Updated 2 months ago
Multimodal-Representation-Learning-MRL / GA-DMS
View on GitHub
[EMNLP25 Main]The official code of "Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval"
☆25Mar 30, 2026Updated 3 months ago
microsoft / A-CLIP
View on GitHub
Official Implementation of Attentive Mask CLIP (ICCV2023, https://arxiv.org/abs/2212.08653)
☆37May 29, 2024Updated 2 years ago
deepglint / Victor
View on GitHub
ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs
☆29Aug 15, 2025Updated 11 months ago
iancovert / locality-alignment
View on GitHub
☆55Jan 17, 2025Updated last year
JerryXu0129 / HyP2-Loss
View on GitHub
☆14Oct 10, 2022Updated 3 years ago
chs20 / fuselip
View on GitHub
FuseLIP: Multimodal Embeddings via Early Fusion of Discrete Tokens
☆17Sep 8, 2025Updated 10 months ago
LuFan31 / CompreCap
View on GitHub
CVPR2025: Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning
☆39Mar 21, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
alipay / POA
View on GitHub
☆22Aug 8, 2024Updated last year
ExplainableML / cosmos
View on GitHub
[CVPR 2025] COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
☆42Mar 27, 2025Updated last year
zehanwang01 / FreeBind
View on GitHub
☆22Apr 22, 2025Updated last year
facebookresearch / webssl
View on GitHub
Code for "Scaling Language-Free Visual Representation Learning" paper (Web-SSL).
☆214Mar 20, 2026Updated 4 months ago
lezhang7 / RiT
View on GitHub
PyTorch implementation of RiT: Vanilla Diffusion Transformers Suffice in Representation Space
☆27May 23, 2026Updated 2 months ago
HanSolo9682 / CounterCurate
View on GitHub
This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.
☆19Jun 27, 2024Updated 2 years ago
deepglint / DanQing
View on GitHub
The official repo for the DanQing dataset.
☆36Mar 25, 2026Updated 4 months ago
RAIVNLab / sugar-crepe
View on GitHub
[NeurIPS 2023] A faithful benchmark for vision-language compositionality
☆94Feb 13, 2024Updated 2 years ago
UCSB-AI / ComCLIP
View on GitHub
Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
☆37Aug 18, 2024Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
SivanDoveh / TSVLC
View on GitHub
Repository for the paper: Teaching Structured Vision & Language Concepts to Vision & Language Models
☆47Sep 25, 2023Updated 2 years ago
wjpoom / SPEC
View on GitHub
[CVPR 2024] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"
☆52Jun 16, 2025Updated last year
LuFan31 / ET-OOD
View on GitHub
CVPR2023:Uncertainty-Aware Optimal Transport for Semantically Coherent Out-of-Distribution Detection
☆26Mar 27, 2023Updated 3 years ago
RAIVNLab / CREPE
View on GitHub
[CVPR23 Highlight] CREPE: Can Vision-Language Foundation Models Reason Compositionally?
☆35Apr 27, 2023Updated 3 years ago
lezhang7 / Enhance-FineGrained
View on GitHub
[CVPR 2024] Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Fine-grained Understanding
☆56Apr 7, 2025Updated last year
tim-learn / UEO
View on GitHub
ICML-2024 highlight paper "Realistic Unsupervised CLIP Fine-tuning with Universal Entropy Optimization"
☆19Jul 18, 2024Updated 2 years ago
EvolvingLMMs-Lab / LLaVA-OneVision-1.5-RL
View on GitHub
Fully Open Framework for Democratized Multimodal Reinforcement Learning.
☆51Dec 19, 2025Updated 7 months ago
Qinying-Liu / TagAlign
View on GitHub
Official implementation of TagAlign
☆37Dec 11, 2024Updated last year
MIV-XJTU / FLAME
View on GitHub
[CVPR 2025] PyTorch implementation of paper "FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training"
☆33Jul 8, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
ytaek-oh / vl_compo
View on GitHub
☆10Jul 5, 2024Updated 2 years ago
sophicle / tokens
View on GitHub
☆19May 12, 2026Updated 2 months ago
hulianyuyy / iLLaVA
View on GitHub
iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models (ICLR2026)
☆23Jun 24, 2026Updated last month
ivonajdenkoska / tulip
View on GitHub
[ICLR 2025] Official code repository for "TULIP: Token-length Upgraded CLIP"
☆32Jan 26, 2026Updated 6 months ago
rootyJeon / Vision-aligned-Latent-Reasoning
View on GitHub
[ICML 2026] Official implementation of Vision-aligned Latent Reasoning for Multi-modal Large Language Model (VaLR)
☆20Apr 30, 2026Updated 2 months ago
m1k2zoo / negbench
View on GitHub
Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"
☆48Feb 26, 2026Updated 5 months ago
arijitray1993 / COLA
View on GitHub
COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!
☆25May 14, 2026Updated 2 months ago