mbzuai-oryx / ALM-BenchLinks
[CVPR 2025 π₯] ALM-Bench is a multilingual multi-modal diverse cultural benchmark for 100 languages across 19 categories. It assesses the next generation of LMMs on cultural inclusitivity.
β40Updated last month
Alternatives and similar repositories for ALM-Bench
Users that are interested in ALM-Bench are comparing it to the libraries listed below
Sorting:
- (WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hβ¦β84Updated 4 months ago
- Holistic evaluation of multimodal foundation modelsβ48Updated 10 months ago
- Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"β24Updated 2 months ago
- [ACL2025] Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Modelsβ77Updated 3 weeks ago
- [ECCV 2024] Official Release of SILC: Improving vision language pretraining with self-distillationβ44Updated 8 months ago
- Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"β56Updated 10 months ago
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"β35Updated 10 months ago
- β41Updated 11 months ago
- β24Updated 2 months ago
- Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"β28Updated 8 months ago
- Sparse Autoencoders Learn Monosemantic Features in Vision-Language Modelsβ19Updated 2 months ago
- Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations" (ICLR '25)β75Updated last month
- Official code of the paper "VideoMolmo: Spatio-Temporal Grounding meets Pointing"β42Updated last week
- [CVPRW 2025] Official repository of paper titled "Towards Evaluating the Robustness of Visual State Space Models"β24Updated 2 weeks ago
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"β20Updated 2 months ago
- β50Updated 5 months ago
- Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.β137Updated 5 months ago
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervisionβ64Updated 11 months ago
- β37Updated 11 months ago
- β33Updated last year
- How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challengesβ30Updated last year
- PyTorch code for "ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning"β20Updated 8 months ago
- Model Merging with SVD to Tie the KnOTS [ICLR 2025]β57Updated 2 months ago
- Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 202β¦β30Updated last month
- Official implementation of "Describing Differences in Image Sets with Natural Language" (CVPR 2024 Oral)β119Updated last year
- Do Vision and Language Models Share Concepts? A Vector Space Alignment Studyβ15Updated 7 months ago
- [CVPR 2025] MicroVQA eval and π€RefineBot code for "MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research"β¦β21Updated 3 months ago
- Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models. TMLR 2025.β78Updated last month
- β42Updated 7 months ago
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025β27Updated last month