zengyan-97/X2-VLM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zengyan-97/X2-VLM)

zengyan-97 / X2-VLM

All-In-One VLM: Image + Video + Transfer to Other Languages / Domains (TPAMI 2023)

☆169

Alternatives and similar repositories for X2-VLM

Users that are interested in X2-VLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zengyan-97 / X-VLM
View on GitHub
X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)
☆507Nov 25, 2022Updated 3 years ago
zengyan-97 / CCLM
View on GitHub
Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training (ACL 2023))
☆93Jun 12, 2023Updated 3 years ago
MichaelZhouwang / VLUE
View on GitHub
This repo contains codes and instructions for baselines in the VLUE benchmark.
☆41Jul 16, 2022Updated 4 years ago
96-Zachary / vse_2ad
View on GitHub
☆15Apr 30, 2022Updated 4 years ago
LgQu / TIGeR
View on GitHub
Code for paper: Unified Text-to-Image Generation and Retrieval
☆16Jul 19, 2026Updated last week
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
OpenMatch / UniVL-DR
View on GitHub
[ICLR 2023] This is the code repo for our ICLR‘23 paper "Universal Vision-Language Dense Retrieval: Learning A Unified Representation Spa…
☆52Jul 3, 2024Updated 2 years ago
salesforce / ALBEF
View on GitHub
Code for ALBEF: a new vision-language pre-training method
☆1,755Sep 20, 2022Updated 3 years ago
jcwang0602 / PLVL
View on GitHub
Progressive Language-guided Visual Learning for Multi-Task Visual Grounding
☆13May 9, 2025Updated last year
microsoft / GLIP
View on GitHub
Grounded Language-Image Pre-training
☆2,605Jan 24, 2024Updated 2 years ago
XMUDeepLIT / AVG-LLaVA
View on GitHub
Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"
☆33Oct 12, 2024Updated last year
baaivision / EVA
View on GitHub
EVA Series: Visual Representation Fantasies from BAAI
☆2,685Aug 1, 2024Updated last year
mertyg / vision-language-models-are-bows
View on GitHub
Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR …
☆294Jun 7, 2023Updated 3 years ago
VL-Group / 2022-NeurIPS-DAA
View on GitHub
The code of the paper of "A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval" accepted b…
☆19Jan 16, 2024Updated 2 years ago
microsoft / RegionCLIP
View on GitHub
[CVPR 2022] Official code for "RegionCLIP: Region-based Language-Image Pretraining"
☆816Mar 20, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
LuminosityX / FNE
View on GitHub
Implementation of our paper, Your Negative May not Be True Negative: Boosting Image-Text Matching with False Negative Elimination..
☆20Dec 3, 2023Updated 2 years ago
llyx97 / sparse-and-robust-PLM
View on GitHub
[NeurIPS 2022] "A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models", Yuanxin Liu, Fandong Meng, Zheng Lin, Jiangnan Li…
☆21Jan 9, 2024Updated 2 years ago
kugwzk / DiDE
View on GitHub
Code for EMNLP 2022 paper “Distilled Dual-Encoder Model for Vision-Language Understanding”
☆31May 1, 2023Updated 3 years ago
xmichelleshihx / AL-LRTD
View on GitHub
Long-range temporal dependency based active learning for surgical workflow recognition
☆10Apr 23, 2020Updated 6 years ago
jshilong / GPT4RoI
View on GitHub
(ECCVW 2025)GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
☆556Jun 3, 2025Updated last year
adapter-hub / xGQA
View on GitHub
☆25Mar 4, 2022Updated 4 years ago
k1rezaei / Text-to-concept
View on GitHub
☆36Feb 5, 2024Updated 2 years ago
yuecao0119 / MMInstruct
View on GitHub
[SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…
☆64Nov 7, 2024Updated last year
CrossmodalGroup / ESL
View on GitHub
☆12May 3, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
uta-smile / TCL
View on GitHub
code for TCL: Vision-Language Pre-Training with Triple Contrastive Learning, CVPR 2022
☆270Oct 2, 2024Updated last year
LCFractal / TGDT
View on GitHub
Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive Training
☆30Jun 20, 2023Updated 3 years ago
ChiYeungLaw / LexLIP-ICCV23
View on GitHub
Official Code for the ICCV23 Paper: "LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Sparse Retrieval…
☆39Oct 14, 2023Updated 2 years ago
shikras / shikra
View on GitHub
☆814Jul 8, 2024Updated 2 years ago
lizhou-cs / mglmm
View on GitHub
☆32Jun 14, 2026Updated last month
fahadshamshad / deep-facial-privacy-prior
View on GitHub
[ECCVW 2024 -- ORAL] Official repository of paper titled "Makeup-Guided Facial Privacy Protection via Untrained Neural Network Priors".
☆12Oct 11, 2024Updated last year
LuminosityX / HAT
View on GitHub
Implementation of our paper, 'Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval.'
☆27Dec 3, 2023Updated 2 years ago
PRIS-CV / RelMatch
View on GitHub
Code release for "Clue Me In: Semi-Supervised FGVC with Out-of-Distribution Data".
☆13Apr 11, 2022Updated 4 years ago
NUBagciLab / CirrMRI600Plus
View on GitHub
CirrMRI600+: Large Scale MRI Collection and Segmentation of Cirrhotic Liver
☆25May 7, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
YYJMJC / LOUPE
View on GitHub
☆45Aug 14, 2023Updated 2 years ago
mzeeshankaramat / SafeAgents
View on GitHub
☆20Jun 4, 2026Updated last month
TIGER-AI-Lab / UniIR
View on GitHub
Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)
☆183Oct 1, 2024Updated last year
ISIC-Research / 2024-challenge-dataset
View on GitHub
Set of scripts and instructions for sub-selecting and formatting raw data exported by the Canfield ISIC2024 Tile Export Tool. The resulti…
☆12May 8, 2024Updated 2 years ago
lerogo / aaai24_itr_cusa
View on GitHub
Source code of our AAAI 2024 paper "Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval"
☆55Mar 28, 2024Updated 2 years ago
facebookresearch / MetaCLIP
View on GitHub
NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024
☆1,850Nov 27, 2025Updated 8 months ago
ant-research / DreamLIP
View on GitHub
[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions
☆138May 8, 2025Updated last year