Georgelingzj/up-to-date-Vision-Language-Models

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Georgelingzj/up-to-date-Vision-Language-Models)

Georgelingzj / up-to-date-Vision-Language-Models

Up-to-date Vision Language Models collection. Mainly focus on computer vision

☆20

Alternatives and similar repositories for up-to-date-Vision-Language-Models

Users that are interested in up-to-date-Vision-Language-Models are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

xfactlab / I0T
View on GitHub
[ACL Main 2025] I0T: Embedding Standardization Method Towards Zero Modality Gap
☆12Jun 18, 2025Updated last year
naver-ai / muco
View on GitHub
Official Pytorch implementation of MuCo: Multi-turn Contrastive Learning for Multimodal Embedding Model (CVPR 2026)
☆15Apr 16, 2026Updated 3 months ago
Aman-4-Real / awesome-multimodal-dialogue
View on GitHub
Paper, dataset and code list for multimodal dialogue.
☆22Jan 2, 2025Updated last year
HaohanWang / HEX
View on GitHub
Example implementation for the paper: (ICLR Oral) Learning Robust Representations by Projecting Superficial Statistics Out
☆27Apr 7, 2021Updated 5 years ago
android-nuc / 17-C-Train
View on GitHub
C training for 17 fresh man
☆14Oct 28, 2017Updated 8 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
BatsResearch / fudd
View on GitHub
Follow-Up Differential Descriptions: Language Models Resolve Ambiguities for Image Classification
☆11Nov 15, 2023Updated 2 years ago
jyanln / AlignReg
View on GitHub
☆17Apr 17, 2024Updated 2 years ago
wookiekim / CorrespondentDream
View on GitHub
Official PyTorch implementation of CorrespondentDream: Enhancing 3D Fidelity of Text-to-3D using Cross-View Correspondences (CVPR 2024 Po…
☆19Apr 29, 2024Updated 2 years ago
RyanLiut / awesome-diverse-captioning
View on GitHub
Some papers about *diverse* image (a few videos) captioning
☆25Apr 4, 2023Updated 3 years ago
paulgavrikov / vlm_shapebias
View on GitHub
Official code for "Can We Talk Models Into Seeing the World Differently?" (ICLR 2025).
☆30Jan 26, 2025Updated last year
VincentDENGP / 3D-LR
View on GitHub
Can 3D Vision-Language Models Truly Understand Natural Language?
☆20Mar 28, 2024Updated 2 years ago
agneet42 / revision
View on GitHub
[ECCV 2024] "REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models"
☆14Aug 6, 2024Updated last year
kaist-ami / AVHBench
View on GitHub
[ICLR'25] Official repository for "AVHBench: A Cross-Modal Hallucination Evaluation for Audio-Visual Large Language Models"
☆25Mar 8, 2026Updated 4 months ago
sung-yeon-kim / R-Adapter-ECCV2024
View on GitHub
Official PyTorch Implementation of Efficient and Versatile Robust Fine-Tuning of Zero-shot Models, ECCV 2024
☆17Oct 3, 2024Updated last year
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
dahyun-kang / lavg
View on GitHub
[ECCV'24] Official PyTorch implementation of In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation
☆51Sep 24, 2024Updated last year
sjunhongshen / Tag-LLM
View on GitHub
☆25Apr 17, 2024Updated 2 years ago
au-revoir / model-editing-ft
View on GitHub
☆13Sep 8, 2024Updated last year
isaaccorley / dfc2022-baseline
View on GitHub
A simple baseline for the 2022 IEEE GRSS Data Fusion Contest (DFC2022)
☆32Jan 13, 2022Updated 4 years ago
mihirp1998 / Slot-TTA
View on GitHub
Slot-TTA shows that test-time adaptation using slot-centric models can improve image segmentation on out-of-distribution examples.
☆26Jun 20, 2023Updated 3 years ago
Show-han / Zeroshot_REC
View on GitHub
Official code for Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and Captions (CVPR 2024)
☆28Jun 21, 2024Updated 2 years ago
android-nuc / NUC-Android-Works
View on GitHub
A collection of works of members.
☆18Mar 4, 2020Updated 6 years ago
vinid / neg_clip
View on GitHub
NegCLIP.
☆41Feb 6, 2023Updated 3 years ago
wrudman / NOTICE
View on GitHub
☆14Apr 10, 2025Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
Vrushank-Ahire / MAVEN_8th_ABAW
View on GitHub
MAVEN: Multi-modal Attention for Valence-Arousal Emotion Network
☆15Jun 13, 2025Updated last year
zilunzhang / StreetCLIP-Repoduce
View on GitHub
☆13Jul 1, 2024Updated 2 years ago
daveredrum / 3d-captioning
View on GitHub
Generate descriptions automatically for 3D shapes in ShapeNet via cross-modal joint embedding
☆15Jan 4, 2019Updated 7 years ago
mlee47 / LLMVS
View on GitHub
Official PyTorch implementation of "Video Summarization with Large Language Models" (CVPR 2025).
☆20Oct 7, 2025Updated 9 months ago
Liu-Jinxin / ur5e_joystick_control
View on GitHub
☆10Dec 15, 2024Updated last year
hesedjds / SQUAT
View on GitHub
The official code for Devil's on the Edges: Selective Quad Attention for Scene Graph Generation, CVPR2023.
☆25Jul 17, 2023Updated 3 years ago
IVY-LVLM / Counterfactual-Inception
View on GitHub
Official PyTorch Implementation for the "What if...?: Thinking Counterfactual Keywords Helps to Mitigate Hallucination in Large Multi-mod…
☆20Sep 26, 2024Updated last year
Princeton-Introduction-to-Robotics / F2023
View on GitHub
☆10Nov 29, 2023Updated 2 years ago
OMEGAFSL / MESSL
View on GitHub
Multiform Ensemble Self-Supervised Learning for Few-Shot Remote Sensing Scene Classification
☆13Mar 10, 2023Updated 3 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
naver-airush / airush2021_source-code
View on GitHub
☆11Oct 21, 2022Updated 3 years ago
Liuhong99 / Imbalanced-SSL
View on GitHub
Code Release for "Self-supervised Learning is More Robust to Dataset Imbalance"
☆39Feb 11, 2022Updated 4 years ago
microsoft / VISOR
View on GitHub
☆46Oct 27, 2023Updated 2 years ago
g-luo / geolocation_via_guidebook_grounding
View on GitHub
G^3: Geolocation via Guidebook Grounding, Findings of EMNLP 2022
☆17Sep 10, 2024Updated last year
mrflogs / CraFT
View on GitHub
Official code for ICML 2024 paper, "Connecting the Dots: Collaborative Fine-tuning for Black-Box Vision-Language Models"
☆19Jun 12, 2024Updated 2 years ago
googlebaba / GraphNOTEARS
View on GitHub
AAAI23-Directed Acyclic Graph Structure Learning from Dynamic Graphs
☆12Nov 25, 2022Updated 3 years ago
sua-choi / CMS
View on GitHub
[CVPR'24] Official PyTorch implementation of Contrastive Mean-Shift Learning for Generalized Category Discovery
☆50May 1, 2024Updated 2 years ago