princeton-pli / VLM_S2HLinks

Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?

☆14

Alternatives and similar repositories for VLM_S2H

Users that are interested in VLM_S2H are comparing it to the libraries listed below

Sorting:

ethanlshen / HierNet
Code for "Are “Hierarchical” Visual Representations Hierarchical?" in NeurIPS Workshop for Symmetry and Geometry in Neural Representation…
☆21Updated last year
HanSolo9682 / CounterCurate
This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.
☆18Updated last year
locuslab / llava-token-compression
☆42Updated 7 months ago
Vinoground / Vinoground
☆11Updated 8 months ago
OpenCausaLab / CELLO
☆21Updated 7 months ago
sail-sg / AnytimeReasoner
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
☆38Updated last month
aszala / EnvGen
Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)
☆34Updated 11 months ago
WangFei-2019 / SNARE
Project for SNARE benchmark
☆11Updated last year
facebookresearch / multimodal_rewardbench
Multimodal RewardBench
☆41Updated 4 months ago
amitakamath / vl_text_encoders_are_bottlenecks
Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!
☆11Updated 2 years ago
kaistAI / Volcano
[NAACL 2024] Vision language model that reduces hallucinations through self-feedback guided revision. Visualizes attentions on image feat…
☆46Updated 10 months ago
locuslab / T-MARS
Code for T-MARS data filtering
☆35Updated last year
g-luo / vlm_cross_modal_reps
Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025
☆27Updated last month
tianyi-lab / R2-T2
Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"
☆15Updated 3 months ago
BatsResearch / ex2
If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions
☆17Updated last year
yunfeixie233 / ViGaL
☆35Updated 2 weeks ago
eric-ai-lab / MMWorld
Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"
☆28Updated 9 months ago
UW-Madison-Lee-Lab / CoBSAT
Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"
☆39Updated 3 weeks ago
chenllliang / G1
G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning
☆64Updated last month
UMass-Embodied-AGI / Mirage
Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)
☆58Updated this week
TencentARC / pi-Tuning
Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.
☆33Updated last year
Cranial-XIX / longhorn
Official PyTorch Implementation of the Longhorn Deep State Space Model
☆51Updated 6 months ago
si0wang / VisVM
☆44Updated 5 months ago
VisuLogic-Benchmark / VisuLogic-Train
☆19Updated 2 months ago
katiekang1998 / reasoning_generalization
☆32Updated 5 months ago
ml-jku / semantic-image-text-alignment
☆24Updated last year
shiqichen17 / AdaptVis
Github repository for "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas" (ICML 2025)
☆35Updated last month
muirbench / MuirBench
A Comprehensive Benchmark for Robust Multi-image Understanding
☆11Updated 9 months ago
belindal / LaMPP
Code for LaMPP: Language Models as Probabilistic Priors for Perception and Action
☆37Updated 2 years ago
Shalev-Lifshitz / MultiAgentVerification
Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers
☆19Updated 3 months ago