mbzuai-oryx / ALM-BenchLinks
[CVPR 2025 π₯] ALM-Bench is a multilingual multi-modal diverse cultural benchmark for 100 languages across 19 categories. It assesses the next generation of LMMs on cultural inclusitivity.
β39Updated last week
Alternatives and similar repositories for ALM-Bench
Users that are interested in ALM-Bench are comparing it to the libraries listed below
Sorting:
- (WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hβ¦β84Updated 3 months ago
- [ACL2025] Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Modelsβ77Updated last week
- [ECCV 2024] Official Release of SILC: Improving vision language pretraining with self-distillationβ44Updated 8 months ago
- [ACL 2025 π₯] Time Travel is a Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifactsβ18Updated 2 weeks ago
- Holistic evaluation of multimodal foundation modelsβ47Updated 9 months ago
- β41Updated 10 months ago
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"β34Updated 9 months ago
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025β26Updated last month
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvementβ89Updated 2 weeks ago
- Code for "Enhancing In-context Learning via Linear Probe Calibration"β35Updated last year
- Code for "C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing"β15Updated last month
- Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations" (ICLR '25)β73Updated last week
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"β55Updated 9 months ago
- Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"β28Updated 7 months ago
- How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challengesβ30Updated last year
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."β42Updated 7 months ago
- [ICLR'25 Oral] MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Modelsβ34Updated 7 months ago
- Do Vision and Language Models Share Concepts? A Vector Space Alignment Studyβ14Updated 6 months ago
- [CVPR 2025] MicroVQA eval and π€RefineBot code for "MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research"β¦β21Updated 2 months ago
- We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing theirβ¦β13Updated 5 months ago
- [EMNLP 2024 Findings] ProSA: Assessing and Understanding the Prompt Sensitivity of LLMsβ27Updated 2 weeks ago
- β22Updated 5 months ago
- Matryoshka Multimodal Modelsβ108Updated 4 months ago
- Official Repository of Personalized Visual Instruct Tuningβ28Updated 3 months ago
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuningβ86Updated last year
- ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models (ICLR 2024, Official Implementation)β16Updated last year
- [CVPRW-25 MMFM] Official repository of paper titled "How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite foβ¦β48Updated 9 months ago
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"β26Updated 7 months ago
- Official Implementation of DiffCLIP: Differential Attention Meets CLIPβ30Updated 2 months ago
- β55Updated 7 months ago