iancovert / locality-alignmentLinks

☆51

Alternatives and similar repositories for locality-alignment

Users that are interested in locality-alignment are comparing it to the libraries listed below

Sorting:

m1k2zoo / negbench
Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"
☆27Updated 3 months ago
OliverRensu / D-iGPT
[ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Lea…
☆98Updated last year
yuecao0119 / MMInstruct
[SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…
☆55Updated 8 months ago
TempleX98 / MoVA
[NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context
☆165Updated 10 months ago
yuecao0119 / MMFuser
The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …
☆57Updated 8 months ago
alibaba / conv-llava
☆118Updated last year
ChenDelong1999 / subobjects
Official repository of paper "Subobject-level Image Tokenization" (ICML-25)
☆80Updated last month
Haochen-Wang409 / TreeVGR
Official implementation of "Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology"
☆48Updated 3 weeks ago
UMass-Embodied-AGI / FlexAttention
[ECCV 2024] FlexAttention for Efficient High-Resolution Vision-Language Models
☆41Updated 6 months ago
eric-ai-lab / GRIT
Official code for paper "GRIT: Teaching MLLMs to Think with Images"
☆114Updated this week
wusize / F-LMM
[CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models
☆100Updated 2 months ago
OpenGVLab / TPO
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
☆53Updated last week
kaiyuyue / nxtp
[CVPR'24 Highlight] PyTorch Implementation of Object Recognition as Next Token Prediction
☆180Updated 3 months ago
Shengcao-Cao / groundLMM
Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision
☆41Updated 4 months ago
wuw2019 / LoTLIP
[NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understanding
☆43Updated 6 months ago
wjpoom / SPEC
[CVPR 2024] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"
☆45Updated last month
mlvlab / RALF
Official implementation of CVPR 2024 paper "Retrieval-Augmented Open-Vocabulary Object Detection".
☆41Updated 10 months ago
XMUDeepLIT / AVG-LLaVA
Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"
☆30Updated 9 months ago
Yangyi-Chen / SOLO
[TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"
☆144Updated 8 months ago
Zi-hao-Wei / Efficient-Vision-Language-Pre-training-by-Cluster-Masking
[CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.
☆28Updated last year
mu-cai / matryoshka-mm
Matryoshka Multimodal Models
☆112Updated 6 months ago
AtsuMiyai / UPD
[ACL2025] Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models
☆77Updated 2 months ago
tripletclip / TripletCLIP
[NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"
☆41Updated 8 months ago
chenshuang-zhang / imagenet_d
[CVPR 2024 Highlight] ImageNet-D
☆43Updated 9 months ago
lambert-x / ProLab
Official Pytorch Implementation of Paper "A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Des…
☆55Updated last year
google-research / silc
[ECCV 2024] Official Release of SILC: Improving vision language pretraining with self-distillation
☆45Updated 10 months ago
heliossun / SQ-LLaVA
Visual self-questioning for large vision-language assistant.
☆42Updated last week
see-say-segment / sesame
🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"
☆42Updated last year
mlvlab / DAPT
Distribution-Aware Prompt Tuning for Vision-Language Models (ICCV 2023)
☆41Updated last year
UCSC-VLAA / CLIPS
An Enhanced CLIP Framework for Learning with Synthetic Captions
☆37Updated 3 months ago