mbzuai-oryx / ALM-BenchLinks

[CVPR 2025 🔥] ALM-Bench is a multilingual multi-modal diverse cultural benchmark for 100 languages across 19 categories. It assesses the next generation of LMMs on cultural inclusitivity.

☆45

Alternatives and similar repositories for ALM-Bench

Users that are interested in ALM-Bench are comparing it to the libraries listed below

Sorting:

mbzuai-oryx / PALO
(WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…
☆82Updated 2 months ago
AtsuMiyai / UPD
[ACL2025] Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models
☆78Updated 5 months ago
pliang279 / HEMM
Holistic evaluation of multimodal foundation models
☆48Updated last year
multimodal-interpretability / maia
Official implementation of MAIA, A Multimodal Automated Interpretability Agent
☆92Updated last week
iancovert / locality-alignment
☆53Updated 9 months ago
EvolvingLMMs-Lab / multimodal-sae
[ICCV 2025] Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.
☆159Updated last month
bethgelab / frequency_determines_performance
Code for the paper: "No Zero-Shot Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance" [NeurI…
☆90Updated last year
apple / ml-rpm-bench
☆41Updated last year
ilkerkesen / ViLMA
ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models (ICLR 2024, Official Implementation)
☆16Updated last year
prometheus-eval / prometheus-vision
[ACL 2024 Findings & ICLR 2024 WS] An Evaluator VLM that is open-source, offers reproducible evaluation, and inexpensive to use. Specific…
☆78Updated last year
m1k2zoo / negbench
Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"
☆36Updated 6 months ago
g-luo / vlm_cross_modal_reps
Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025
☆31Updated 5 months ago
mominabbass / LinC
Code for "Enhancing In-context Learning via Linear Probe Calibration"
☆36Updated last year
Stanford-AIMI / RaVL
[NeurIPS 2024] RaVL: Discovering and Mitigating Spurious Correlations in Fine-Tuned Vision-Language Models
☆28Updated 11 months ago
Understanding-Visual-Datasets / VisDiff
Official implementation of "Describing Differences in Image Sets with Natural Language" (CVPR 2024 Oral)
☆124Updated last year
google / haloquest
☆23Updated last year
lyan62 / vlm-info-loss
☆19Updated last month
SriramB-98 / vit-decompose
☆23Updated 9 months ago
mu-cai / matryoshka-mm
Matryoshka Multimodal Models
☆114Updated 9 months ago
eric-ai-lab / ComCLIP
Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
☆35Updated last year
facebookresearch / unibench
Python Library to evaluate VLM models' robustness across diverse benchmarks
☆217Updated last week
jmhb0 / viddiff
[ICLR 2025] Video Action Differencing
☆47Updated 3 months ago
apple / ml-tic-clip
Repository for the paper: "TiC-CLIP: Continual Training of CLIP Models" ICLR 2024
☆106Updated last year
open-compass / ProSA
[EMNLP 2024 Findings] ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
☆29Updated 5 months ago
anguyen8 / vision-llms-are-blind
☆135Updated 2 months ago
htqin / GoogleBard-VisUnderstand
How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges
☆30Updated 2 years ago
nickjiang2378 / vlm-hallucinations
[ICLR '25] Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations"
☆88Updated 5 months ago
SHI-Labs / VisPer-LM
[NeurIPS 2025] Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation, arXiv 2024
☆64Updated last week
ExplainableML / fomo_in_flux
Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]
☆59Updated 10 months ago
jmhb0 / microvqa
[CVPR 2025] MicroVQA eval and 🤖RefineBot code for "MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research"…
☆25Updated last week